[RFC,00/42] Introduce memory region concept (including device local memory)

Message ID	20190214145740.14521-1-matthew.auld@intel.com (mailing list archive)
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Matthew Auld <matthew.auld@intel.com> To: intel-gfx@lists.freedesktop.org Date: Thu, 14 Feb 2019 14:56:58 +0000 Message-Id: <20190214145740.14521-1-matthew.auld@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [RFC PATCH 00/42] Introduce memory region concept (including device local memory) Precedence: list Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
Series	Introduce memory region concept (including device local memory) \| expand [RFC,00/42] Introduce memory region concept (including device local memory) [RFC,01/42] drm/i915: support 1G pages for the 48b PPGTT [RFC,02/42] drm/i915: enable platform support for 1G pages [RFC,03/42] drm/i915: buddy allocator [RFC,04/42] drm/i915: introduce intel_memory_region [RFC,05/42] drm/i915/region: support basic eviction [RFC,06/42] drm/i915/region: support continuous allocations [RFC,07/42] drm/i915/region: support volatile objects [RFC,08/42] drm/i915: Add memory region information to device_info [RFC,09/42] drm/i915: support creating LMEM objects [RFC,10/42] drm/i915/lmem: add helper to get CPU visible pfn [RFC,11/42] drm/i915/selftests: exercise writes to LMEM [RFC,12/42] drm/i915/selftests: exercise huge-pages for LMEM [RFC,13/42] drm/i915: support object clearing via blitter engine [RFC,14/42] drm/i915: introduce kernel blitter_context [RFC,15/42] drm/i915: support copying objects via blitter engine [RFC,16/42] drm/i915: support basic object migration [RFC,17/42] drm/i915/lmem: support kernel mapping [RFC,18/42] drm/i915/lmem: support CPU relocations [RFC,19/42] drm/i915: add vfunc for pread [RFC,20/42] drm/i915/lmem: support pread [RFC,21/42] drm/i915/lmem: support pwrite [RFC,22/42] drm/i915: define HAS_MAPPABLE_APERTURE [RFC,23/42] drm/i915: do not map aperture if it is not available. [RFC,24/42] drm/i915: expose missing map_gtt support to users [RFC,25/42] drm/i915: set num_fence_regs to 0 if there is no aperture [RFC,26/42] drm/i915: error capture with no ggtt slot [RFC,27/42] drm/i915: Don't try to place HWS in non-existing mappable region [RFC,28/42] drm/i915: Split out GTT fault handler to make it generic [RFC,29/42] drm/i915: Set correct vmf source pages for gem objects [RFC,30/42] drm/i915: Introduce DRM_I915_GEM_MMAP_OFFSET [RFC,31/42] drm/i915: cpu-map based dumb buffers [RFC,32/42] drm/i915: Add fill_pages handler for dma_buf imported objects [RFC,33/42] UPSTREAM: drm/i915/query: Split out query item checks [RFC,34/42] drm/i915/query: Expose memory regions through the query uAPI [RFC,35/42] drm/i915: Introduce GEM_OBJECT_SETPARAM with I915_PARAM_MEMORY_REGION [RFC,36/42] drm/i915/lmem: include debugfs metrics [RFC,37/42] drm/i915: enumerate and init each supported region [RFC,38/42] drm/i915: treat shmem as a region [RFC,39/42] drm/i915: treat stolen as a region [RFC,40/42] drm/i915: setup io-mapping for LMEM [RFC,41/42] HAX drm/i915: add the fake lmem region [RFC,42/42] HAX drm/i915/lmem: default userspace allocations to LMEM

Matthew Auld Feb. 14, 2019, 2:56 p.m. UTC

In preparation for upcoming devices with device local memory, introduce the
concept of different memory regions, and a simple buddy allocator to manage
them.

At the end of the series are a couple of HAX patches which introduce a fake
local memory region for testing purposes. Currently smoke tested on a Skull
Canyon device.

Abdiel Janulgue (13):
  drm/i915: Add memory region information to device_info
  drm/i915/lmem: add helper to get CPU visible pfn
  drm/i915/lmem: support kernel mapping
  drm/i915: Split out GTT fault handler to make it generic
  drm/i915: Set correct vmf source pages for gem objects
  drm/i915: Introduce DRM_I915_GEM_MMAP_OFFSET
  drm/i915: cpu-map based dumb buffers
  drm/i915: Add fill_pages handler for dma_buf imported objects
  UPSTREAM: drm/i915/query: Split out query item checks
  drm/i915/query: Expose memory regions through the query uAPI
  drm/i915: Introduce GEM_OBJECT_SETPARAM with I915_PARAM_MEMORY_REGION
  drm/i915: enumerate and init each supported region
  drm/i915: setup io-mapping for LMEM

Daniele Ceraolo Spurio (5):
  drm/i915: define HAS_MAPPABLE_APERTURE
  drm/i915: do not map aperture if it is not available.
  drm/i915: expose missing map_gtt support to users
  drm/i915: set num_fence_regs to 0 if there is no aperture
  drm/i915: error capture with no ggtt slot

Matthew Auld (23):
  drm/i915: support 1G pages for the 48b PPGTT
  drm/i915: enable platform support for 1G pages
  drm/i915: buddy allocator
  drm/i915: introduce intel_memory_region
  drm/i915/region: support basic eviction
  drm/i915/region: support continuous allocations
  drm/i915/region: support volatile objects
  drm/i915: support creating LMEM objects
  drm/i915/selftests: exercise writes to LMEM
  drm/i915/selftests: exercise huge-pages for LMEM
  drm/i915: support object clearing via blitter engine
  drm/i915: introduce kernel blitter_context
  drm/i915: support copying objects via blitter engine
  drm/i915: support basic object migration
  drm/i915/lmem: support CPU relocations
  drm/i915: add vfunc for pread
  drm/i915/lmem: support pread
  drm/i915/lmem: support pwrite
  drm/i915/lmem: include debugfs metrics
  drm/i915: treat shmem as a region
  drm/i915: treat stolen as a region
  HAX drm/i915: add the fake lmem region
  HAX drm/i915/lmem: default userspace allocations to LMEM

Michal Wajdeczko (1):
  drm/i915: Don't try to place HWS in non-existing mappable region

 arch/x86/kernel/early-quirks.c                |   26 +
 drivers/gpu/drm/i915/Makefile                 |    3 +
 drivers/gpu/drm/i915/i915_debugfs.c           |   31 +
 drivers/gpu/drm/i915/i915_drv.c               |   19 +-
 drivers/gpu/drm/i915/i915_drv.h               |   37 +-
 drivers/gpu/drm/i915/i915_gem.c               | 1005 +++++++++++++++--
 drivers/gpu/drm/i915/i915_gem_buddy.c         |  206 ++++
 drivers/gpu/drm/i915/i915_gem_buddy.h         |  118 ++
 drivers/gpu/drm/i915/i915_gem_context.c       |   13 +
 drivers/gpu/drm/i915/i915_gem_dmabuf.c        |   14 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c    |   67 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c           |  136 ++-
 drivers/gpu/drm/i915/i915_gem_gtt.h           |    4 +-
 drivers/gpu/drm/i915/i915_gem_object.c        |    1 +
 drivers/gpu/drm/i915/i915_gem_object.h        |   43 +-
 drivers/gpu/drm/i915/i915_gem_shrinker.c      |   59 +
 drivers/gpu/drm/i915/i915_gem_stolen.c        |   66 +-
 drivers/gpu/drm/i915/i915_gpu_error.c         |   63 +-
 drivers/gpu/drm/i915/i915_pci.c               |   18 +-
 drivers/gpu/drm/i915/i915_query.c             |   97 +-
 drivers/gpu/drm/i915/intel_device_info.h      |    1 +
 drivers/gpu/drm/i915/intel_engine_cs.c        |    2 +-
 drivers/gpu/drm/i915/intel_gpu_commands.h     |    3 +
 drivers/gpu/drm/i915/intel_memory_region.c    |  286 +++++
 drivers/gpu/drm/i915/intel_memory_region.h    |  140 +++
 drivers/gpu/drm/i915/intel_region_lmem.c      |  388 +++++++
 drivers/gpu/drm/i915/intel_region_lmem.h      |   46 +
 drivers/gpu/drm/i915/selftests/huge_pages.c   |  204 +++-
 .../gpu/drm/i915/selftests/i915_gem_buddy.c   |  209 ++++
 .../gpu/drm/i915/selftests/i915_gem_object.c  |  150 +++
 .../drm/i915/selftests/i915_live_selftests.h  |    1 +
 .../drm/i915/selftests/i915_mock_selftests.h  |    2 +
 .../drm/i915/selftests/intel_memory_region.c  |  911 +++++++++++++++
 .../gpu/drm/i915/selftests/mock_gem_device.c  |    8 +-
 drivers/gpu/drm/i915/selftests/mock_region.c  |   75 ++
 drivers/gpu/drm/i915/selftests/mock_region.h  |   35 +
 include/drm/i915_drm.h                        |    3 +
 include/uapi/drm/i915_drm.h                   |   94 +-
 38 files changed, 4396 insertions(+), 188 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_gem_buddy.c
 create mode 100644 drivers/gpu/drm/i915/i915_gem_buddy.h
 create mode 100644 drivers/gpu/drm/i915/intel_memory_region.c
 create mode 100644 drivers/gpu/drm/i915/intel_memory_region.h
 create mode 100644 drivers/gpu/drm/i915/intel_region_lmem.c
 create mode 100644 drivers/gpu/drm/i915/intel_region_lmem.h
 create mode 100644 drivers/gpu/drm/i915/selftests/i915_gem_buddy.c
 create mode 100644 drivers/gpu/drm/i915/selftests/intel_memory_region.c
 create mode 100644 drivers/gpu/drm/i915/selftests/mock_region.c
 create mode 100644 drivers/gpu/drm/i915/selftests/mock_region.h

Dave Airlie Feb. 15, 2019, 12:47 a.m. UTC | #1

On Fri, 15 Feb 2019 at 00:57, Matthew Auld <matthew.auld@intel.com> wrote:
>
> In preparation for upcoming devices with device local memory, introduce the
> concept of different memory regions, and a simple buddy allocator to manage
> them.

This is missing the information on why it's not TTM.

Nothing against extending i915 gem off into doing stuff we already
have examples off in tree, but before you do that it would be good to
have a why we can't use TTM discussion in public.

Dave.

Joonas Lahtinen Feb. 19, 2019, 1:32 p.m. UTC | #2

+ dri-devel mailing list, especially for the buddy allocator part

Quoting Dave Airlie (2019-02-15 02:47:07)
> On Fri, 15 Feb 2019 at 00:57, Matthew Auld <matthew.auld@intel.com> wrote:
> >
> > In preparation for upcoming devices with device local memory, introduce the
> > concept of different memory regions, and a simple buddy allocator to manage
> > them.
> 
> This is missing the information on why it's not TTM.
> 
> Nothing against extending i915 gem off into doing stuff we already
> have examples off in tree, but before you do that it would be good to
> have a why we can't use TTM discussion in public.

Glad that you asked. It's my fault that it was not included in
the cover letter. I anticipated the question, but was travelling
for a couple of days at the time this was sent. I didn't want
to write a hasty explanation and then disappear, leaving others to
take the heat.

So here goes the less-hasty version:

We did an analysis on the effort needed vs benefit gained of using
TTM when this was started initially. The conclusion was that we
already share the interesting bits of code through core DRM, really.

Re-writing the memory handling to TTM would buy us more fine-grained
locking. But it's more a trait of rewriting the memory handling with
the information we have learned, than rewriting it to use TTM :)

And further, we've been getting rid of struct_mutex at a steady phase
in the past years, so we have a clear path to the fine-grained locking
already in the not-so-distant future. With all this we did not see
much gained from converting over, as the code sharing is already
substantial.

We also wanted to have the buddy allocator instead of a for loop making
drm_mm allocations to make sure we can keep the memory fragmentation
at bay. The intent is to move the buddy allocator to core DRM, to the
benefit of all the drivers, if there is interest from community. It has
been written as a strictly separate component with that in mind.

And if you take the buddy allocator out of the patch set, the rest is
mostly just vfuncing things up to be able to have different backing
storages for objects. We took the opportunity to move over to the more
valgrind friendly mmap while touching things, but it's something we
have been contemplating anyway. And yeah, loads of selftests.

That's really all that needed adding, and most of it is internal to
i915 and not to do with uAPI. This means porting over an userspace
driver doesn't require a substantial rewrite, but adding new a few
new IOCTLs to set the preferred backing storage placements.

All the previous GEM abstractions keep applying, so we did not see
a justification to rewrite the kernel driver and userspace drivers.
It would have just to made things look like TTM, when we already
have the important parts of the code shared with TTM drivers
behind the GEM interfaces which all our drivers sit on top of.

I hope this answered the question to some extent, and feel free to
ask more details or provide suggestion to what we could have done
differently.

Regards, Joonas

> 
> Dave.
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Dave Airlie Feb. 25, 2019, 8:24 p.m. UTC | #3

On Tue, 19 Feb 2019 at 23:32, Joonas Lahtinen
<joonas.lahtinen@linux.intel.com> wrote:
>
> + dri-devel mailing list, especially for the buddy allocator part
>
> Quoting Dave Airlie (2019-02-15 02:47:07)
> > On Fri, 15 Feb 2019 at 00:57, Matthew Auld <matthew.auld@intel.com> wrote:
> > >
> > > In preparation for upcoming devices with device local memory, introduce the
> > > concept of different memory regions, and a simple buddy allocator to manage
> > > them.
> >
> > This is missing the information on why it's not TTM.
> >
> > Nothing against extending i915 gem off into doing stuff we already
> > have examples off in tree, but before you do that it would be good to
> > have a why we can't use TTM discussion in public.
>
> Glad that you asked. It's my fault that it was not included in
> the cover letter. I anticipated the question, but was travelling
> for a couple of days at the time this was sent. I didn't want
> to write a hasty explanation and then disappear, leaving others to
> take the heat.
>
> So here goes the less-hasty version:
>
> We did an analysis on the effort needed vs benefit gained of using
> TTM when this was started initially. The conclusion was that we
> already share the interesting bits of code through core DRM, really.
>
> Re-writing the memory handling to TTM would buy us more fine-grained
> locking. But it's more a trait of rewriting the memory handling with
> the information we have learned, than rewriting it to use TTM :)
>
> And further, we've been getting rid of struct_mutex at a steady phase
> in the past years, so we have a clear path to the fine-grained locking
> already in the not-so-distant future. With all this we did not see
> much gained from converting over, as the code sharing is already
> substantial.
>
> We also wanted to have the buddy allocator instead of a for loop making
> drm_mm allocations to make sure we can keep the memory fragmentation
> at bay. The intent is to move the buddy allocator to core DRM, to the
> benefit of all the drivers, if there is interest from community. It has
> been written as a strictly separate component with that in mind.
>
> And if you take the buddy allocator out of the patch set, the rest is
> mostly just vfuncing things up to be able to have different backing
> storages for objects. We took the opportunity to move over to the more
> valgrind friendly mmap while touching things, but it's something we
> have been contemplating anyway. And yeah, loads of selftests.
>
> That's really all that needed adding, and most of it is internal to
> i915 and not to do with uAPI. This means porting over an userspace
> driver doesn't require a substantial rewrite, but adding new a few
> new IOCTLs to set the preferred backing storage placements.
>
> All the previous GEM abstractions keep applying, so we did not see
> a justification to rewrite the kernel driver and userspace drivers.
> It would have just to made things look like TTM, when we already
> have the important parts of the code shared with TTM drivers
> behind the GEM interfaces which all our drivers sit on top of.

a) you guys should be the community as well, if the buddy allocator is
useful in the core DRM get out there and try and see if anyone else
has a use case for it, like the GPU scheduler we have now (can i915
use that yet? :-)

b) however this last two paragraphs fill me with no confidence that
you've looked at TTM at all. It sounds like you took comments about
TTM made 10 years ago, and didn't update them. There should be no
major reason for a uapi change just because you adopt TTM. TTM hasn't
ever had a common uapi across drivers upstream, one was proposed
initially > 10 years ago. All the current TTM using drivers except
vmware use a GEM based API as well. TTM is an internal driver helper
for managing pools of RAM.

I'm just not sure what rebuilding a chunk of shared code inside the
i915 driver is buying you, except a transition path into divergence
from all the other discrete RAM drivers. Like the gallium aversion in
userspace, having a TTM aversion in kernel space is going to be the
wrong path, and I'd rather not have to clean it up in 5 years when you
guys eventually realise it.

The i915 GEM code get rewritten and refactored quite often and has a
bus factor of ickle, if he decided to go elsewhere, you will have a
pile of code that nobody gets, I think having a TTM backend would have
a better long term effect on your driver maintainability.

Dave.

Joonas Lahtinen Feb. 26, 2019, 2:35 a.m. UTC | #4

Quoting Dave Airlie (2019-02-25 12:24:48)
> On Tue, 19 Feb 2019 at 23:32, Joonas Lahtinen
> <joonas.lahtinen@linux.intel.com> wrote:
> >
> > + dri-devel mailing list, especially for the buddy allocator part
> >
> > Quoting Dave Airlie (2019-02-15 02:47:07)
> > > On Fri, 15 Feb 2019 at 00:57, Matthew Auld <matthew.auld@intel.com> wrote:
> > > >
> > > > In preparation for upcoming devices with device local memory, introduce the
> > > > concept of different memory regions, and a simple buddy allocator to manage
> > > > them.
> > >
> > > This is missing the information on why it's not TTM.
> > >
> > > Nothing against extending i915 gem off into doing stuff we already
> > > have examples off in tree, but before you do that it would be good to
> > > have a why we can't use TTM discussion in public.
> >
> > Glad that you asked. It's my fault that it was not included in
> > the cover letter. I anticipated the question, but was travelling
> > for a couple of days at the time this was sent. I didn't want
> > to write a hasty explanation and then disappear, leaving others to
> > take the heat.
> >
> > So here goes the less-hasty version:
> >
> > We did an analysis on the effort needed vs benefit gained of using
> > TTM when this was started initially. The conclusion was that we
> > already share the interesting bits of code through core DRM, really.
> >
> > Re-writing the memory handling to TTM would buy us more fine-grained
> > locking. But it's more a trait of rewriting the memory handling with
> > the information we have learned, than rewriting it to use TTM :)
> >
> > And further, we've been getting rid of struct_mutex at a steady phase
> > in the past years, so we have a clear path to the fine-grained locking
> > already in the not-so-distant future. With all this we did not see
> > much gained from converting over, as the code sharing is already
> > substantial.
> >
> > We also wanted to have the buddy allocator instead of a for loop making
> > drm_mm allocations to make sure we can keep the memory fragmentation
> > at bay. The intent is to move the buddy allocator to core DRM, to the
> > benefit of all the drivers, if there is interest from community. It has
> > been written as a strictly separate component with that in mind.
> >
> > And if you take the buddy allocator out of the patch set, the rest is
> > mostly just vfuncing things up to be able to have different backing
> > storages for objects. We took the opportunity to move over to the more
> > valgrind friendly mmap while touching things, but it's something we
> > have been contemplating anyway. And yeah, loads of selftests.
> >
> > That's really all that needed adding, and most of it is internal to
> > i915 and not to do with uAPI. This means porting over an userspace
> > driver doesn't require a substantial rewrite, but adding new a few
> > new IOCTLs to set the preferred backing storage placements.
> >
> > All the previous GEM abstractions keep applying, so we did not see
> > a justification to rewrite the kernel driver and userspace drivers.
> > It would have just to made things look like TTM, when we already
> > have the important parts of the code shared with TTM drivers
> > behind the GEM interfaces which all our drivers sit on top of.
> 
> a) you guys should be the community as well, if the buddy allocator is
> useful in the core DRM get out there and try and see if anyone else
> has a use case for it, like the GPU scheduler we have now (can i915
> use that yet? :-)

Well, the buddy allocator should be useful for anybody wishing to have
as continuous physical allocations as possible. I have naively assumed
that would be almost everyone. So it would be only a question if others
see the amount of work required to convert over is justified for them.

For the common DRM scheduler, I think a solid move from the beginning
would have been to factor out the i915 scheduler as it was most advanced
in features :) Now there is a way more trivial common scheduler core with
no easy path to transition without a feature regression.

We'd have to rewrite many of the more advanced features for that codebase
before we could transition over. It's hard to justify such work, for
that it would buy us very little compared to amount of work.

Situation would be different if there was something gained from
switching over. This would be the situation if the more advanced
scheduler was picked as the shared codebase.

> b) however this last two paragraphs fill me with no confidence that
> you've looked at TTM at all. It sounds like you took comments about
> TTM made 10 years ago, and didn't update them. There should be no
> major reason for a uapi change just because you adopt TTM. TTM hasn't
> ever had a common uapi across drivers upstream, one was proposed
> initially > 10 years ago.

This is one part my confusion on what the question was for and other
part bad wording on my behalf.

So an attempt to re-answer: When this effort was started it was obvious
that the amount of new code required was low (as you can see). Feedback
about what converting to TTM would require vs. give us was gathered from
folks including Daniel and Chris and the unanimous decision was that it
would not be justifiable.

> All the current TTM using drivers except
> vmware use a GEM based API as well. TTM is an internal driver helper
> for managing pools of RAM.
> 
> I'm just not sure what rebuilding a chunk of shared code inside the
> i915 driver is buying you, except a transition path into divergence
> from all the other discrete RAM drivers.

I guess what I'm trying to say, there isn't that much code being added
that wouldn't be there anyway. The i915 GEM code has already grown to
what it is before this.

Adding the suggested smaller amount of code vs. doing a much bigger
rewrite is something of a straightforward choice with the amount of
platforms and features in flight, especially when the end result is
the same.

> Like the gallium aversion in
> userspace, having a TTM aversion in kernel space is going to be the
> wrong path, and I'd rather not have to clean it up in 5 years when you
> guys eventually realise it.
> 
> The i915 GEM code get rewritten and refactored quite often

Well, we continually try to improve things and accommodate upcoming
product requirements and hardware feature. The thing is that we do it
upstream first, so hard to avoid the churn.

When moving rapidly, it's hard to project what the shared portion of
the code might be or exactly look. I'm myself much more believer in
extracting the generic portions out when the code is there and working.

The churn would be even bigger if there is a strict requirement to
refactor the common parts out straight from the beginning, before
anything is even proven in practice.

That'd just mean it gets much harder to sell doing things upstream
first to the management.

When the code is in place and working, and we still agree that more
code reuse could be beneficial, it will be more viable thing to do
when there are less moving parts.

> and has a
> bus factor of ickle, if he decided to go elsewhere, you will have a
> pile of code that nobody gets, I think having a TTM backend would have
> a better long term effect on your driver maintainability.

Well, as you might notice, the patches are not from ickle.

I'm not saying we couldn't improve code sharing. I currently have a
priority in making sure we can enable the upcoming platforms, which has
a much more direct effect on our short and long term driver
maintainability through resourcing :)

If the platform support is not there, it's hard to have the headcount
to work on refactoring the reusable parts of code out.

So I'm hoping we can have an incremental approach to the code
refactorings suggested, that doesn't stall the delivery of the features.

Regards, Joonas

> Dave.

Alex Deucher Feb. 26, 2019, 5:31 a.m. UTC | #5

On Mon, Feb 25, 2019 at 9:35 PM Joonas Lahtinen
<joonas.lahtinen@linux.intel.com> wrote:
>
> Quoting Dave Airlie (2019-02-25 12:24:48)
> > On Tue, 19 Feb 2019 at 23:32, Joonas Lahtinen
> > <joonas.lahtinen@linux.intel.com> wrote:
> > >
> > > + dri-devel mailing list, especially for the buddy allocator part
> > >
> > > Quoting Dave Airlie (2019-02-15 02:47:07)
> > > > On Fri, 15 Feb 2019 at 00:57, Matthew Auld <matthew.auld@intel.com> wrote:
> > > > >
> > > > > In preparation for upcoming devices with device local memory, introduce the
> > > > > concept of different memory regions, and a simple buddy allocator to manage
> > > > > them.
> > > >
> > > > This is missing the information on why it's not TTM.
> > > >
> > > > Nothing against extending i915 gem off into doing stuff we already
> > > > have examples off in tree, but before you do that it would be good to
> > > > have a why we can't use TTM discussion in public.
> > >
> > > Glad that you asked. It's my fault that it was not included in
> > > the cover letter. I anticipated the question, but was travelling
> > > for a couple of days at the time this was sent. I didn't want
> > > to write a hasty explanation and then disappear, leaving others to
> > > take the heat.
> > >
> > > So here goes the less-hasty version:
> > >
> > > We did an analysis on the effort needed vs benefit gained of using
> > > TTM when this was started initially. The conclusion was that we
> > > already share the interesting bits of code through core DRM, really.
> > >
> > > Re-writing the memory handling to TTM would buy us more fine-grained
> > > locking. But it's more a trait of rewriting the memory handling with
> > > the information we have learned, than rewriting it to use TTM :)
> > >
> > > And further, we've been getting rid of struct_mutex at a steady phase
> > > in the past years, so we have a clear path to the fine-grained locking
> > > already in the not-so-distant future. With all this we did not see
> > > much gained from converting over, as the code sharing is already
> > > substantial.
> > >
> > > We also wanted to have the buddy allocator instead of a for loop making
> > > drm_mm allocations to make sure we can keep the memory fragmentation
> > > at bay. The intent is to move the buddy allocator to core DRM, to the
> > > benefit of all the drivers, if there is interest from community. It has
> > > been written as a strictly separate component with that in mind.
> > >
> > > And if you take the buddy allocator out of the patch set, the rest is
> > > mostly just vfuncing things up to be able to have different backing
> > > storages for objects. We took the opportunity to move over to the more
> > > valgrind friendly mmap while touching things, but it's something we
> > > have been contemplating anyway. And yeah, loads of selftests.
> > >
> > > That's really all that needed adding, and most of it is internal to
> > > i915 and not to do with uAPI. This means porting over an userspace
> > > driver doesn't require a substantial rewrite, but adding new a few
> > > new IOCTLs to set the preferred backing storage placements.
> > >
> > > All the previous GEM abstractions keep applying, so we did not see
> > > a justification to rewrite the kernel driver and userspace drivers.
> > > It would have just to made things look like TTM, when we already
> > > have the important parts of the code shared with TTM drivers
> > > behind the GEM interfaces which all our drivers sit on top of.
> >
> > a) you guys should be the community as well, if the buddy allocator is
> > useful in the core DRM get out there and try and see if anyone else
> > has a use case for it, like the GPU scheduler we have now (can i915
> > use that yet? :-)
>
> Well, the buddy allocator should be useful for anybody wishing to have
> as continuous physical allocations as possible. I have naively assumed
> that would be almost everyone. So it would be only a question if others
> see the amount of work required to convert over is justified for them.
>
> For the common DRM scheduler, I think a solid move from the beginning
> would have been to factor out the i915 scheduler as it was most advanced
> in features :) Now there is a way more trivial common scheduler core with
> no easy path to transition without a feature regression.

Can you elaborate?  What features are missing from the drm gpu scheduler?

>
> We'd have to rewrite many of the more advanced features for that codebase
> before we could transition over. It's hard to justify such work, for
> that it would buy us very little compared to amount of work.
>
> Situation would be different if there was something gained from
> switching over. This would be the situation if the more advanced
> scheduler was picked as the shared codebase.
>
> > b) however this last two paragraphs fill me with no confidence that
> > you've looked at TTM at all. It sounds like you took comments about
> > TTM made 10 years ago, and didn't update them. There should be no
> > major reason for a uapi change just because you adopt TTM. TTM hasn't
> > ever had a common uapi across drivers upstream, one was proposed
> > initially > 10 years ago.
>
> This is one part my confusion on what the question was for and other
> part bad wording on my behalf.
>
> So an attempt to re-answer: When this effort was started it was obvious
> that the amount of new code required was low (as you can see). Feedback
> about what converting to TTM would require vs. give us was gathered from
> folks including Daniel and Chris and the unanimous decision was that it
> would not be justifiable.
>
> > All the current TTM using drivers except
> > vmware use a GEM based API as well. TTM is an internal driver helper
> > for managing pools of RAM.
> >
> > I'm just not sure what rebuilding a chunk of shared code inside the
> > i915 driver is buying you, except a transition path into divergence
> > from all the other discrete RAM drivers.
>
> I guess what I'm trying to say, there isn't that much code being added
> that wouldn't be there anyway. The i915 GEM code has already grown to
> what it is before this.
>
> Adding the suggested smaller amount of code vs. doing a much bigger
> rewrite is something of a straightforward choice with the amount of
> platforms and features in flight, especially when the end result is
> the same.

Because you will probably never do it.  It's almost always easier to
just incrementally add on to existing code.  One could argue that GEM
evolved into more or less the exact same thing as TTM anyway so why
not bite the bullet and switch at some point?  TTM works fine even for
UMA hardware.

>
> > Like the gallium aversion in
> > userspace, having a TTM aversion in kernel space is going to be the
> > wrong path, and I'd rather not have to clean it up in 5 years when you
> > guys eventually realise it.
> >
> > The i915 GEM code get rewritten and refactored quite often
>
> Well, we continually try to improve things and accommodate upcoming
> product requirements and hardware feature. The thing is that we do it
> upstream first, so hard to avoid the churn.
>
> When moving rapidly, it's hard to project what the shared portion of
> the code might be or exactly look. I'm myself much more believer in
> extracting the generic portions out when the code is there and working.
>
> The churn would be even bigger if there is a strict requirement to
> refactor the common parts out straight from the beginning, before
> anything is even proven in practice.
>
> That'd just mean it gets much harder to sell doing things upstream
> first to the management.

What does upstream first have to do with contributing to shared code?
There is a common misconception in big companies that if you utilize
shared infrastructure it will slow you down or you'll lose control of
your code which is I think what you are referring to.  Ultimately, it
does sometimes raise the bar, but in the long term it benefits
everyone and usually it doesn't really add that much overhead. It
sounds like you are just feeding that misconception; you can't
contribute to or take advantage of any shared infrastructure because
that might slow you down.  If that is the case, then why does upstream
first even matter?  It seems like the only common code you want to
support is common code that you wrote in the first place.

Alex

>
> When the code is in place and working, and we still agree that more
> code reuse could be beneficial, it will be more viable thing to do
> when there are less moving parts.
>
> > and has a
> > bus factor of ickle, if he decided to go elsewhere, you will have a
> > pile of code that nobody gets, I think having a TTM backend would have
> > a better long term effect on your driver maintainability.
>
> Well, as you might notice, the patches are not from ickle.
>
> I'm not saying we couldn't improve code sharing. I currently have a
> priority in making sure we can enable the upcoming platforms, which has
> a much more direct effect on our short and long term driver
> maintainability through resourcing :)
>
> If the platform support is not there, it's hard to have the headcount
> to work on refactoring the reusable parts of code out.
>
> So I'm hoping we can have an incremental approach to the code
> refactorings suggested, that doesn't stall the delivery of the features.
>
> Regards, Joonas
>
> > Dave.
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

Jani Nikula Feb. 26, 2019, 10:41 a.m. UTC | #6

So I'm not going to go into technical detail here, which I'll happily
leave to Joonas et al, but I think a couple of general points deserve to
be addressed.

On Tue, 26 Feb 2019, Alex Deucher <alexdeucher@gmail.com> wrote:
> On Mon, Feb 25, 2019 at 9:35 PM Joonas Lahtinen
> <joonas.lahtinen@linux.intel.com> wrote:
>> Adding the suggested smaller amount of code vs. doing a much bigger
>> rewrite is something of a straightforward choice with the amount of
>> platforms and features in flight, especially when the end result is
>> the same.
>
> Because you will probably never do it.  It's almost always easier to
> just incrementally add on to existing code.  One could argue that GEM
> evolved into more or less the exact same thing as TTM anyway so why
> not bite the bullet and switch at some point?  TTM works fine even for
> UMA hardware.

Surely we have lots of faults, but being averse to refactoring,
reworking, and continuously rewriting parts of the driver is not among
them by any stretch. Sometimes it's just for the general
maintainability, sometimes to remodel stuff to neatly plug in support
for that new piece of hardware, and everything in between.

> There is a common misconception in big companies that if you utilize
> shared infrastructure it will slow you down or you'll lose control of
> your code which is I think what you are referring to.  Ultimately, it
> does sometimes raise the bar, but in the long term it benefits
> everyone and usually it doesn't really add that much overhead. It
> sounds like you are just feeding that misconception; you can't
> contribute to or take advantage of any shared infrastructure because
> that might slow you down.  If that is the case, then why does upstream
> first even matter?  It seems like the only common code you want to
> support is common code that you wrote in the first place.

Again, on a general note, without actually checking the stats, I like to
think we're pretty good citizens wrt actively using and contributing to
shared infrastructure, and shared uapi. Especially so for KMS, and even
when it really has slowed us down.

So while you may have fair points about a specific case, and again I'll
let Joonas address the specific case, I'll have to ask you to please not
generalize that to the whole driver.

BR,
Jani.

Joonas Lahtinen Feb. 26, 2019, 12:17 p.m. UTC | #7

Quoting Alex Deucher (2019-02-25 21:31:43)
> On Mon, Feb 25, 2019 at 9:35 PM Joonas Lahtinen
> <joonas.lahtinen@linux.intel.com> wrote:
> >
> > Quoting Dave Airlie (2019-02-25 12:24:48)
> > > On Tue, 19 Feb 2019 at 23:32, Joonas Lahtinen
> > > <joonas.lahtinen@linux.intel.com> wrote:
> > > >
> > > > + dri-devel mailing list, especially for the buddy allocator part
> > > >
> > > > Quoting Dave Airlie (2019-02-15 02:47:07)
> > > > > On Fri, 15 Feb 2019 at 00:57, Matthew Auld <matthew.auld@intel.com> wrote:
> > > > > >
> > > > > > In preparation for upcoming devices with device local memory, introduce the
> > > > > > concept of different memory regions, and a simple buddy allocator to manage
> > > > > > them.
> > > > >
> > > > > This is missing the information on why it's not TTM.
> > > > >
> > > > > Nothing against extending i915 gem off into doing stuff we already
> > > > > have examples off in tree, but before you do that it would be good to
> > > > > have a why we can't use TTM discussion in public.
> > > >
> > > > Glad that you asked. It's my fault that it was not included in
> > > > the cover letter. I anticipated the question, but was travelling
> > > > for a couple of days at the time this was sent. I didn't want
> > > > to write a hasty explanation and then disappear, leaving others to
> > > > take the heat.
> > > >
> > > > So here goes the less-hasty version:
> > > >
> > > > We did an analysis on the effort needed vs benefit gained of using
> > > > TTM when this was started initially. The conclusion was that we
> > > > already share the interesting bits of code through core DRM, really.
> > > >
> > > > Re-writing the memory handling to TTM would buy us more fine-grained
> > > > locking. But it's more a trait of rewriting the memory handling with
> > > > the information we have learned, than rewriting it to use TTM :)
> > > >
> > > > And further, we've been getting rid of struct_mutex at a steady phase
> > > > in the past years, so we have a clear path to the fine-grained locking
> > > > already in the not-so-distant future. With all this we did not see
> > > > much gained from converting over, as the code sharing is already
> > > > substantial.
> > > >
> > > > We also wanted to have the buddy allocator instead of a for loop making
> > > > drm_mm allocations to make sure we can keep the memory fragmentation
> > > > at bay. The intent is to move the buddy allocator to core DRM, to the
> > > > benefit of all the drivers, if there is interest from community. It has
> > > > been written as a strictly separate component with that in mind.
> > > >
> > > > And if you take the buddy allocator out of the patch set, the rest is
> > > > mostly just vfuncing things up to be able to have different backing
> > > > storages for objects. We took the opportunity to move over to the more
> > > > valgrind friendly mmap while touching things, but it's something we
> > > > have been contemplating anyway. And yeah, loads of selftests.
> > > >
> > > > That's really all that needed adding, and most of it is internal to
> > > > i915 and not to do with uAPI. This means porting over an userspace
> > > > driver doesn't require a substantial rewrite, but adding new a few
> > > > new IOCTLs to set the preferred backing storage placements.
> > > >
> > > > All the previous GEM abstractions keep applying, so we did not see
> > > > a justification to rewrite the kernel driver and userspace drivers.
> > > > It would have just to made things look like TTM, when we already
> > > > have the important parts of the code shared with TTM drivers
> > > > behind the GEM interfaces which all our drivers sit on top of.
> > >
> > > a) you guys should be the community as well, if the buddy allocator is
> > > useful in the core DRM get out there and try and see if anyone else
> > > has a use case for it, like the GPU scheduler we have now (can i915
> > > use that yet? :-)
> >
> > Well, the buddy allocator should be useful for anybody wishing to have
> > as continuous physical allocations as possible. I have naively assumed
> > that would be almost everyone. So it would be only a question if others
> > see the amount of work required to convert over is justified for them.
> >
> > For the common DRM scheduler, I think a solid move from the beginning
> > would have been to factor out the i915 scheduler as it was most advanced
> > in features :) Now there is a way more trivial common scheduler core with
> > no easy path to transition without a feature regression.
> 
> Can you elaborate?  What features are missing from the drm gpu scheduler?

Priority based pre-emption is the big thing coming to mind. But maybe we
should not derail the discussion in this thread. The discussion should
be in the archives, or we can start a new thread.

> > We'd have to rewrite many of the more advanced features for that codebase
> > before we could transition over. It's hard to justify such work, for
> > that it would buy us very little compared to amount of work.
> >
> > Situation would be different if there was something gained from
> > switching over. This would be the situation if the more advanced
> > scheduler was picked as the shared codebase.
> >
> > > b) however this last two paragraphs fill me with no confidence that
> > > you've looked at TTM at all. It sounds like you took comments about
> > > TTM made 10 years ago, and didn't update them. There should be no
> > > major reason for a uapi change just because you adopt TTM. TTM hasn't
> > > ever had a common uapi across drivers upstream, one was proposed
> > > initially > 10 years ago.
> >
> > This is one part my confusion on what the question was for and other
> > part bad wording on my behalf.
> >
> > So an attempt to re-answer: When this effort was started it was obvious
> > that the amount of new code required was low (as you can see). Feedback
> > about what converting to TTM would require vs. give us was gathered from
> > folks including Daniel and Chris and the unanimous decision was that it
> > would not be justifiable.
> >
> > > All the current TTM using drivers except
> > > vmware use a GEM based API as well. TTM is an internal driver helper
> > > for managing pools of RAM.
> > >
> > > I'm just not sure what rebuilding a chunk of shared code inside the
> > > i915 driver is buying you, except a transition path into divergence
> > > from all the other discrete RAM drivers.
> >
> > I guess what I'm trying to say, there isn't that much code being added
> > that wouldn't be there anyway. The i915 GEM code has already grown to
> > what it is before this.
> >
> > Adding the suggested smaller amount of code vs. doing a much bigger
> > rewrite is something of a straightforward choice with the amount of
> > platforms and features in flight, especially when the end result is
> > the same.
> 
> Because you will probably never do it.  It's almost always easier to
> just incrementally add on to existing code.  One could argue that GEM
> evolved into more or less the exact same thing as TTM anyway so why
> not bite the bullet and switch at some point?  TTM works fine even for
> UMA hardware.

By my understanding there are quite a few differences in low on memory
handling and other more subtle aspects between the two.
Converting over to TTM would be a pretty big rewrite of i915, and the
code sharing is already happening for important parts.

> > > Like the gallium aversion in
> > > userspace, having a TTM aversion in kernel space is going to be the
> > > wrong path, and I'd rather not have to clean it up in 5 years when you
> > > guys eventually realise it.
> > >
> > > The i915 GEM code get rewritten and refactored quite often
> >
> > Well, we continually try to improve things and accommodate upcoming
> > product requirements and hardware feature. The thing is that we do it
> > upstream first, so hard to avoid the churn.
> >
> > When moving rapidly, it's hard to project what the shared portion of
> > the code might be or exactly look. I'm myself much more believer in
> > extracting the generic portions out when the code is there and working.
> >
> > The churn would be even bigger if there is a strict requirement to
> > refactor the common parts out straight from the beginning, before
> > anything is even proven in practice.
> >
> > That'd just mean it gets much harder to sell doing things upstream
> > first to the management.
> 
> What does upstream first have to do with contributing to shared code?
> There is a common misconception in big companies that if you utilize
> shared infrastructure it will slow you down or you'll lose control of
> your code which is I think what you are referring to.

That'd be the wrong impression. Quite the contrary, the biggest chunk
of code here, buddy allocator, was written specifically with the fact
in mind that we can easily move it to DRM core.

And I think we have a solid track history of contributing to the DRM
core, so I'm not sure why you'd think that.

> Ultimately, it
> does sometimes raise the bar, but in the long term it benefits
> everyone and usually it doesn't really add that much overhead.

That we both agree on. But in the case of this specific patch series
it is about rewriting the driver to TTM, which is quite an overhead.

And if most of the important code is shared anyway through DRM core,
doing the rewrite + covering all the subtle differences, does not
sound too compelling.

> It
> sounds like you are just feeding that misconception; you can't
> contribute to or take advantage of any shared infrastructure because
> that might slow you down. If that is the case, then why does upstream
> first even matter?  It seems like the only common code you want to
> support is common code that you wrote in the first place.

That's not the case.

The scheduler code and GEM are the two things with most dependencies
everywhere in the i915 code. So it's not quite fair to generalize that
if we're not ripping those two out and rewriting the driver, we're not
contributing to shared infrastructure.

If there are some opportunities we've missed, I'm happy to be reminded.
I'd also like to mention that we have the dma-fence tracing and buddy
allocator patches (in this series) that could be of interest to have in
the shared infrastructure.

Regards, Joonas

> Alex
> 
> >
> > When the code is in place and working, and we still agree that more
> > code reuse could be beneficial, it will be more viable thing to do
> > when there are less moving parts.
> >
> > > and has a
> > > bus factor of ickle, if he decided to go elsewhere, you will have a
> > > pile of code that nobody gets, I think having a TTM backend would have
> > > a better long term effect on your driver maintainability.
> >
> > Well, as you might notice, the patches are not from ickle.
> >
> > I'm not saying we couldn't improve code sharing. I currently have a
> > priority in making sure we can enable the upcoming platforms, which has
> > a much more direct effect on our short and long term driver
> > maintainability through resourcing :)
> >
> > If the platform support is not there, it's hard to have the headcount
> > to work on refactoring the reusable parts of code out.
> >
> > So I'm hoping we can have an incremental approach to the code
> > refactorings suggested, that doesn't stall the delivery of the features.
> >
> > Regards, Joonas
> >
> > > Dave.
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel

Alex Deucher Feb. 26, 2019, 5:20 p.m. UTC | #8

On Tue, Feb 26, 2019 at 7:17 AM Joonas Lahtinen
<joonas.lahtinen@linux.intel.com> wrote:
>
> Quoting Alex Deucher (2019-02-25 21:31:43)
> > On Mon, Feb 25, 2019 at 9:35 PM Joonas Lahtinen
> > <joonas.lahtinen@linux.intel.com> wrote:
> > >
> > > Quoting Dave Airlie (2019-02-25 12:24:48)
> > > > On Tue, 19 Feb 2019 at 23:32, Joonas Lahtinen
> > > > <joonas.lahtinen@linux.intel.com> wrote:
> > > > >
> > > > > + dri-devel mailing list, especially for the buddy allocator part
> > > > >
> > > > > Quoting Dave Airlie (2019-02-15 02:47:07)
> > > > > > On Fri, 15 Feb 2019 at 00:57, Matthew Auld <matthew.auld@intel.com> wrote:
> > > > > > >
> > > > > > > In preparation for upcoming devices with device local memory, introduce the
> > > > > > > concept of different memory regions, and a simple buddy allocator to manage
> > > > > > > them.
> > > > > >
> > > > > > This is missing the information on why it's not TTM.
> > > > > >
> > > > > > Nothing against extending i915 gem off into doing stuff we already
> > > > > > have examples off in tree, but before you do that it would be good to
> > > > > > have a why we can't use TTM discussion in public.
> > > > >
> > > > > Glad that you asked. It's my fault that it was not included in
> > > > > the cover letter. I anticipated the question, but was travelling
> > > > > for a couple of days at the time this was sent. I didn't want
> > > > > to write a hasty explanation and then disappear, leaving others to
> > > > > take the heat.
> > > > >
> > > > > So here goes the less-hasty version:
> > > > >
> > > > > We did an analysis on the effort needed vs benefit gained of using
> > > > > TTM when this was started initially. The conclusion was that we
> > > > > already share the interesting bits of code through core DRM, really.
> > > > >
> > > > > Re-writing the memory handling to TTM would buy us more fine-grained
> > > > > locking. But it's more a trait of rewriting the memory handling with
> > > > > the information we have learned, than rewriting it to use TTM :)
> > > > >
> > > > > And further, we've been getting rid of struct_mutex at a steady phase
> > > > > in the past years, so we have a clear path to the fine-grained locking
> > > > > already in the not-so-distant future. With all this we did not see
> > > > > much gained from converting over, as the code sharing is already
> > > > > substantial.
> > > > >
> > > > > We also wanted to have the buddy allocator instead of a for loop making
> > > > > drm_mm allocations to make sure we can keep the memory fragmentation
> > > > > at bay. The intent is to move the buddy allocator to core DRM, to the
> > > > > benefit of all the drivers, if there is interest from community. It has
> > > > > been written as a strictly separate component with that in mind.
> > > > >
> > > > > And if you take the buddy allocator out of the patch set, the rest is
> > > > > mostly just vfuncing things up to be able to have different backing
> > > > > storages for objects. We took the opportunity to move over to the more
> > > > > valgrind friendly mmap while touching things, but it's something we
> > > > > have been contemplating anyway. And yeah, loads of selftests.
> > > > >
> > > > > That's really all that needed adding, and most of it is internal to
> > > > > i915 and not to do with uAPI. This means porting over an userspace
> > > > > driver doesn't require a substantial rewrite, but adding new a few
> > > > > new IOCTLs to set the preferred backing storage placements.
> > > > >
> > > > > All the previous GEM abstractions keep applying, so we did not see
> > > > > a justification to rewrite the kernel driver and userspace drivers.
> > > > > It would have just to made things look like TTM, when we already
> > > > > have the important parts of the code shared with TTM drivers
> > > > > behind the GEM interfaces which all our drivers sit on top of.
> > > >
> > > > a) you guys should be the community as well, if the buddy allocator is
> > > > useful in the core DRM get out there and try and see if anyone else
> > > > has a use case for it, like the GPU scheduler we have now (can i915
> > > > use that yet? :-)
> > >
> > > Well, the buddy allocator should be useful for anybody wishing to have
> > > as continuous physical allocations as possible. I have naively assumed
> > > that would be almost everyone. So it would be only a question if others
> > > see the amount of work required to convert over is justified for them.
> > >
> > > For the common DRM scheduler, I think a solid move from the beginning
> > > would have been to factor out the i915 scheduler as it was most advanced
> > > in features :) Now there is a way more trivial common scheduler core with
> > > no easy path to transition without a feature regression.
> >
> > Can you elaborate?  What features are missing from the drm gpu scheduler?
>
> Priority based pre-emption is the big thing coming to mind. But maybe we
> should not derail the discussion in this thread. The discussion should
> be in the archives, or we can start a new thread.

Probably not worth continuing here, but I think there are probably
features that the drm scheduler has that the i915 scheduler does not.
For example, engine load balancing.  So i wouldn't necessarily say
it's more advanced, just different feature sets.  We could all be
enjoying all of these features if we all worked on the same
infrastructure.

>
> > > We'd have to rewrite many of the more advanced features for that codebase
> > > before we could transition over. It's hard to justify such work, for
> > > that it would buy us very little compared to amount of work.
> > >
> > > Situation would be different if there was something gained from
> > > switching over. This would be the situation if the more advanced
> > > scheduler was picked as the shared codebase.
> > >
> > > > b) however this last two paragraphs fill me with no confidence that
> > > > you've looked at TTM at all. It sounds like you took comments about
> > > > TTM made 10 years ago, and didn't update them. There should be no
> > > > major reason for a uapi change just because you adopt TTM. TTM hasn't
> > > > ever had a common uapi across drivers upstream, one was proposed
> > > > initially > 10 years ago.
> > >
> > > This is one part my confusion on what the question was for and other
> > > part bad wording on my behalf.
> > >
> > > So an attempt to re-answer: When this effort was started it was obvious
> > > that the amount of new code required was low (as you can see). Feedback
> > > about what converting to TTM would require vs. give us was gathered from
> > > folks including Daniel and Chris and the unanimous decision was that it
> > > would not be justifiable.
> > >
> > > > All the current TTM using drivers except
> > > > vmware use a GEM based API as well. TTM is an internal driver helper
> > > > for managing pools of RAM.
> > > >
> > > > I'm just not sure what rebuilding a chunk of shared code inside the
> > > > i915 driver is buying you, except a transition path into divergence
> > > > from all the other discrete RAM drivers.
> > >
> > > I guess what I'm trying to say, there isn't that much code being added
> > > that wouldn't be there anyway. The i915 GEM code has already grown to
> > > what it is before this.
> > >
> > > Adding the suggested smaller amount of code vs. doing a much bigger
> > > rewrite is something of a straightforward choice with the amount of
> > > platforms and features in flight, especially when the end result is
> > > the same.
> >
> > Because you will probably never do it.  It's almost always easier to
> > just incrementally add on to existing code.  One could argue that GEM
> > evolved into more or less the exact same thing as TTM anyway so why
> > not bite the bullet and switch at some point?  TTM works fine even for
> > UMA hardware.
>
> By my understanding there are quite a few differences in low on memory
> handling and other more subtle aspects between the two.
> Converting over to TTM would be a pretty big rewrite of i915, and the
> code sharing is already happening for important parts.
>
> > > > Like the gallium aversion in
> > > > userspace, having a TTM aversion in kernel space is going to be the
> > > > wrong path, and I'd rather not have to clean it up in 5 years when you
> > > > guys eventually realise it.
> > > >
> > > > The i915 GEM code get rewritten and refactored quite often
> > >
> > > Well, we continually try to improve things and accommodate upcoming
> > > product requirements and hardware feature. The thing is that we do it
> > > upstream first, so hard to avoid the churn.
> > >
> > > When moving rapidly, it's hard to project what the shared portion of
> > > the code might be or exactly look. I'm myself much more believer in
> > > extracting the generic portions out when the code is there and working.
> > >
> > > The churn would be even bigger if there is a strict requirement to
> > > refactor the common parts out straight from the beginning, before
> > > anything is even proven in practice.
> > >
> > > That'd just mean it gets much harder to sell doing things upstream
> > > first to the management.
> >
> > What does upstream first have to do with contributing to shared code?
> > There is a common misconception in big companies that if you utilize
> > shared infrastructure it will slow you down or you'll lose control of
> > your code which is I think what you are referring to.
>
> That'd be the wrong impression. Quite the contrary, the biggest chunk
> of code here, buddy allocator, was written specifically with the fact
> in mind that we can easily move it to DRM core.
>
> And I think we have a solid track history of contributing to the DRM
> core, so I'm not sure why you'd think that.

I didn't say you don't have a track record of contributing, my point
was that you only seem to use what you contributed in the first place.
This is another example.  I haven't seen Intel use any infrastructure
they didn't write directly.  Off the top of my head:
- drm GPU scheduler
- TTM
- Gallium
- Multi-media (creating VAAPI when there were arguably superior
alternatives, VDPAU and OpenMAX)
- OpenCL

In all those cases Intel did something equivalent in parallel, pushing
its projects while trying to downplay the existing projects.  This
just seems like more of that.

>
> > Ultimately, it
> > does sometimes raise the bar, but in the long term it benefits
> > everyone and usually it doesn't really add that much overhead.
>
> That we both agree on. But in the case of this specific patch series
> it is about rewriting the driver to TTM, which is quite an overhead.
>
> And if most of the important code is shared anyway through DRM core,
> doing the rewrite + covering all the subtle differences, does not
> sound too compelling.

Sometimes it is a lot of overhead.  That's the pact you make when you
put code upstream.  The community wants the best solution for everyone
and to be able to maintain it if you don't.  Sometimes that means
extra overhead.

At the end of the day, I don't really care that much.  I get it, we
all have large projects with scarce resources.  I just think a few
years down the road we'll all regret it as a community.

Alex

Christian König Feb. 26, 2019, 6:52 p.m. UTC | #9

Am 26.02.19 um 18:20 schrieb Alex Deucher:
> On Tue, Feb 26, 2019 at 7:17 AM Joonas Lahtinen
> <joonas.lahtinen@linux.intel.com> wrote:
>> Quoting Alex Deucher (2019-02-25 21:31:43)
>>> On Mon, Feb 25, 2019 at 9:35 PM Joonas Lahtinen
>>> <joonas.lahtinen@linux.intel.com> wrote:
>>>> Quoting Dave Airlie (2019-02-25 12:24:48)
>>>>> On Tue, 19 Feb 2019 at 23:32, Joonas Lahtinen
>>>>> <joonas.lahtinen@linux.intel.com> wrote:
>>>>>> + dri-devel mailing list, especially for the buddy allocator part
>>>>>>
>>>>>> Quoting Dave Airlie (2019-02-15 02:47:07)
>>>>>>> On Fri, 15 Feb 2019 at 00:57, Matthew Auld <matthew.auld@intel.com> wrote:
>>>>>>>> In preparation for upcoming devices with device local memory, introduce the
>>>>>>>> concept of different memory regions, and a simple buddy allocator to manage
>>>>>>>> them.
>>>>>>> This is missing the information on why it's not TTM.
>>>>>>>
>>>>>>> Nothing against extending i915 gem off into doing stuff we already
>>>>>>> have examples off in tree, but before you do that it would be good to
>>>>>>> have a why we can't use TTM discussion in public.
>>>>>> Glad that you asked. It's my fault that it was not included in
>>>>>> the cover letter. I anticipated the question, but was travelling
>>>>>> for a couple of days at the time this was sent. I didn't want
>>>>>> to write a hasty explanation and then disappear, leaving others to
>>>>>> take the heat.
>>>>>>
>>>>>> So here goes the less-hasty version:
>>>>>>
>>>>>> We did an analysis on the effort needed vs benefit gained of using
>>>>>> TTM when this was started initially. The conclusion was that we
>>>>>> already share the interesting bits of code through core DRM, really.
>>>>>>
>>>>>> Re-writing the memory handling to TTM would buy us more fine-grained
>>>>>> locking. But it's more a trait of rewriting the memory handling with
>>>>>> the information we have learned, than rewriting it to use TTM :)
>>>>>>
>>>>>> And further, we've been getting rid of struct_mutex at a steady phase
>>>>>> in the past years, so we have a clear path to the fine-grained locking
>>>>>> already in the not-so-distant future. With all this we did not see
>>>>>> much gained from converting over, as the code sharing is already
>>>>>> substantial.
>>>>>>
>>>>>> We also wanted to have the buddy allocator instead of a for loop making
>>>>>> drm_mm allocations to make sure we can keep the memory fragmentation
>>>>>> at bay. The intent is to move the buddy allocator to core DRM, to the
>>>>>> benefit of all the drivers, if there is interest from community. It has
>>>>>> been written as a strictly separate component with that in mind.
>>>>>>
>>>>>> And if you take the buddy allocator out of the patch set, the rest is
>>>>>> mostly just vfuncing things up to be able to have different backing
>>>>>> storages for objects. We took the opportunity to move over to the more
>>>>>> valgrind friendly mmap while touching things, but it's something we
>>>>>> have been contemplating anyway. And yeah, loads of selftests.
>>>>>>
>>>>>> That's really all that needed adding, and most of it is internal to
>>>>>> i915 and not to do with uAPI. This means porting over an userspace
>>>>>> driver doesn't require a substantial rewrite, but adding new a few
>>>>>> new IOCTLs to set the preferred backing storage placements.
>>>>>>
>>>>>> All the previous GEM abstractions keep applying, so we did not see
>>>>>> a justification to rewrite the kernel driver and userspace drivers.
>>>>>> It would have just to made things look like TTM, when we already
>>>>>> have the important parts of the code shared with TTM drivers
>>>>>> behind the GEM interfaces which all our drivers sit on top of.
>>>>> a) you guys should be the community as well, if the buddy allocator is
>>>>> useful in the core DRM get out there and try and see if anyone else
>>>>> has a use case for it, like the GPU scheduler we have now (can i915
>>>>> use that yet? :-)
>>>> Well, the buddy allocator should be useful for anybody wishing to have
>>>> as continuous physical allocations as possible. I have naively assumed
>>>> that would be almost everyone. So it would be only a question if others
>>>> see the amount of work required to convert over is justified for them.
>>>>
>>>> For the common DRM scheduler, I think a solid move from the beginning
>>>> would have been to factor out the i915 scheduler as it was most advanced
>>>> in features :) Now there is a way more trivial common scheduler core with
>>>> no easy path to transition without a feature regression.
>>> Can you elaborate?  What features are missing from the drm gpu scheduler?
>> Priority based pre-emption is the big thing coming to mind. But maybe we
>> should not derail the discussion in this thread. The discussion should
>> be in the archives, or we can start a new thread.
> Probably not worth continuing here, but I think there are probably
> features that the drm scheduler has that the i915 scheduler does not.
> For example, engine load balancing.  So i wouldn't necessarily say
> it's more advanced, just different feature sets.  We could all be
> enjoying all of these features if we all worked on the same
> infrastructure.

Actually it is worth continuing. To be honest the DRM scheduler came 
first and the i915 scheduler should have been pushed back a bit more.

Additional to that it does support priority based pre-emption from the 
very beginning, you guys just didn't cared to take a look :(

Regards,
Christian.

>
>>>> We'd have to rewrite many of the more advanced features for that codebase
>>>> before we could transition over. It's hard to justify such work, for
>>>> that it would buy us very little compared to amount of work.
>>>>
>>>> Situation would be different if there was something gained from
>>>> switching over. This would be the situation if the more advanced
>>>> scheduler was picked as the shared codebase.
>>>>
>>>>> b) however this last two paragraphs fill me with no confidence that
>>>>> you've looked at TTM at all. It sounds like you took comments about
>>>>> TTM made 10 years ago, and didn't update them. There should be no
>>>>> major reason for a uapi change just because you adopt TTM. TTM hasn't
>>>>> ever had a common uapi across drivers upstream, one was proposed
>>>>> initially > 10 years ago.
>>>> This is one part my confusion on what the question was for and other
>>>> part bad wording on my behalf.
>>>>
>>>> So an attempt to re-answer: When this effort was started it was obvious
>>>> that the amount of new code required was low (as you can see). Feedback
>>>> about what converting to TTM would require vs. give us was gathered from
>>>> folks including Daniel and Chris and the unanimous decision was that it
>>>> would not be justifiable.
>>>>
>>>>> All the current TTM using drivers except
>>>>> vmware use a GEM based API as well. TTM is an internal driver helper
>>>>> for managing pools of RAM.
>>>>>
>>>>> I'm just not sure what rebuilding a chunk of shared code inside the
>>>>> i915 driver is buying you, except a transition path into divergence
>>>>> from all the other discrete RAM drivers.
>>>> I guess what I'm trying to say, there isn't that much code being added
>>>> that wouldn't be there anyway. The i915 GEM code has already grown to
>>>> what it is before this.
>>>>
>>>> Adding the suggested smaller amount of code vs. doing a much bigger
>>>> rewrite is something of a straightforward choice with the amount of
>>>> platforms and features in flight, especially when the end result is
>>>> the same.
>>> Because you will probably never do it.  It's almost always easier to
>>> just incrementally add on to existing code.  One could argue that GEM
>>> evolved into more or less the exact same thing as TTM anyway so why
>>> not bite the bullet and switch at some point?  TTM works fine even for
>>> UMA hardware.
>> By my understanding there are quite a few differences in low on memory
>> handling and other more subtle aspects between the two.
>> Converting over to TTM would be a pretty big rewrite of i915, and the
>> code sharing is already happening for important parts.
>>
>>>>> Like the gallium aversion in
>>>>> userspace, having a TTM aversion in kernel space is going to be the
>>>>> wrong path, and I'd rather not have to clean it up in 5 years when you
>>>>> guys eventually realise it.
>>>>>
>>>>> The i915 GEM code get rewritten and refactored quite often
>>>> Well, we continually try to improve things and accommodate upcoming
>>>> product requirements and hardware feature. The thing is that we do it
>>>> upstream first, so hard to avoid the churn.
>>>>
>>>> When moving rapidly, it's hard to project what the shared portion of
>>>> the code might be or exactly look. I'm myself much more believer in
>>>> extracting the generic portions out when the code is there and working.
>>>>
>>>> The churn would be even bigger if there is a strict requirement to
>>>> refactor the common parts out straight from the beginning, before
>>>> anything is even proven in practice.
>>>>
>>>> That'd just mean it gets much harder to sell doing things upstream
>>>> first to the management.
>>> What does upstream first have to do with contributing to shared code?
>>> There is a common misconception in big companies that if you utilize
>>> shared infrastructure it will slow you down or you'll lose control of
>>> your code which is I think what you are referring to.
>> That'd be the wrong impression. Quite the contrary, the biggest chunk
>> of code here, buddy allocator, was written specifically with the fact
>> in mind that we can easily move it to DRM core.
>>
>> And I think we have a solid track history of contributing to the DRM
>> core, so I'm not sure why you'd think that.
> I didn't say you don't have a track record of contributing, my point
> was that you only seem to use what you contributed in the first place.
> This is another example.  I haven't seen Intel use any infrastructure
> they didn't write directly.  Off the top of my head:
> - drm GPU scheduler
> - TTM
> - Gallium
> - Multi-media (creating VAAPI when there were arguably superior
> alternatives, VDPAU and OpenMAX)
> - OpenCL
>
> In all those cases Intel did something equivalent in parallel, pushing
> its projects while trying to downplay the existing projects.  This
> just seems like more of that.
>
>>> Ultimately, it
>>> does sometimes raise the bar, but in the long term it benefits
>>> everyone and usually it doesn't really add that much overhead.
>> That we both agree on. But in the case of this specific patch series
>> it is about rewriting the driver to TTM, which is quite an overhead.
>>
>> And if most of the important code is shared anyway through DRM core,
>> doing the rewrite + covering all the subtle differences, does not
>> sound too compelling.
> Sometimes it is a lot of overhead.  That's the pact you make when you
> put code upstream.  The community wants the best solution for everyone
> and to be able to maintain it if you don't.  Sometimes that means
> extra overhead.
>
> At the end of the day, I don't really care that much.  I get it, we
> all have large projects with scarce resources.  I just think a few
> years down the road we'll all regret it as a community.
>
> Alex
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

Alex Deucher Feb. 26, 2019, 7:14 p.m. UTC | #10

On Tue, Feb 26, 2019 at 12:20 PM Alex Deucher <alexdeucher@gmail.com> wrote:
>
> On Tue, Feb 26, 2019 at 7:17 AM Joonas Lahtinen
> <joonas.lahtinen@linux.intel.com> wrote:
> >
> > Quoting Alex Deucher (2019-02-25 21:31:43)
> > > On Mon, Feb 25, 2019 at 9:35 PM Joonas Lahtinen
> > > <joonas.lahtinen@linux.intel.com> wrote:
> > > >
> > > > Quoting Dave Airlie (2019-02-25 12:24:48)
> > > > > On Tue, 19 Feb 2019 at 23:32, Joonas Lahtinen
> > > > > <joonas.lahtinen@linux.intel.com> wrote:
> > > > > >
> > > > > > + dri-devel mailing list, especially for the buddy allocator part
> > > > > >
> > > > > > Quoting Dave Airlie (2019-02-15 02:47:07)
> > > > > > > On Fri, 15 Feb 2019 at 00:57, Matthew Auld <matthew.auld@intel.com> wrote:
> > > > > > > >
> > > > > > > > In preparation for upcoming devices with device local memory, introduce the
> > > > > > > > concept of different memory regions, and a simple buddy allocator to manage
> > > > > > > > them.
> > > > > > >
> > > > > > > This is missing the information on why it's not TTM.
> > > > > > >
> > > > > > > Nothing against extending i915 gem off into doing stuff we already
> > > > > > > have examples off in tree, but before you do that it would be good to
> > > > > > > have a why we can't use TTM discussion in public.
> > > > > >
> > > > > > Glad that you asked. It's my fault that it was not included in
> > > > > > the cover letter. I anticipated the question, but was travelling
> > > > > > for a couple of days at the time this was sent. I didn't want
> > > > > > to write a hasty explanation and then disappear, leaving others to
> > > > > > take the heat.
> > > > > >
> > > > > > So here goes the less-hasty version:
> > > > > >
> > > > > > We did an analysis on the effort needed vs benefit gained of using
> > > > > > TTM when this was started initially. The conclusion was that we
> > > > > > already share the interesting bits of code through core DRM, really.
> > > > > >
> > > > > > Re-writing the memory handling to TTM would buy us more fine-grained
> > > > > > locking. But it's more a trait of rewriting the memory handling with
> > > > > > the information we have learned, than rewriting it to use TTM :)
> > > > > >
> > > > > > And further, we've been getting rid of struct_mutex at a steady phase
> > > > > > in the past years, so we have a clear path to the fine-grained locking
> > > > > > already in the not-so-distant future. With all this we did not see
> > > > > > much gained from converting over, as the code sharing is already
> > > > > > substantial.
> > > > > >
> > > > > > We also wanted to have the buddy allocator instead of a for loop making
> > > > > > drm_mm allocations to make sure we can keep the memory fragmentation
> > > > > > at bay. The intent is to move the buddy allocator to core DRM, to the
> > > > > > benefit of all the drivers, if there is interest from community. It has
> > > > > > been written as a strictly separate component with that in mind.
> > > > > >
> > > > > > And if you take the buddy allocator out of the patch set, the rest is
> > > > > > mostly just vfuncing things up to be able to have different backing
> > > > > > storages for objects. We took the opportunity to move over to the more
> > > > > > valgrind friendly mmap while touching things, but it's something we
> > > > > > have been contemplating anyway. And yeah, loads of selftests.
> > > > > >
> > > > > > That's really all that needed adding, and most of it is internal to
> > > > > > i915 and not to do with uAPI. This means porting over an userspace
> > > > > > driver doesn't require a substantial rewrite, but adding new a few
> > > > > > new IOCTLs to set the preferred backing storage placements.
> > > > > >
> > > > > > All the previous GEM abstractions keep applying, so we did not see
> > > > > > a justification to rewrite the kernel driver and userspace drivers.
> > > > > > It would have just to made things look like TTM, when we already
> > > > > > have the important parts of the code shared with TTM drivers
> > > > > > behind the GEM interfaces which all our drivers sit on top of.
> > > > >
> > > > > a) you guys should be the community as well, if the buddy allocator is
> > > > > useful in the core DRM get out there and try and see if anyone else
> > > > > has a use case for it, like the GPU scheduler we have now (can i915
> > > > > use that yet? :-)
> > > >
> > > > Well, the buddy allocator should be useful for anybody wishing to have
> > > > as continuous physical allocations as possible. I have naively assumed
> > > > that would be almost everyone. So it would be only a question if others
> > > > see the amount of work required to convert over is justified for them.
> > > >
> > > > For the common DRM scheduler, I think a solid move from the beginning
> > > > would have been to factor out the i915 scheduler as it was most advanced
> > > > in features :) Now there is a way more trivial common scheduler core with
> > > > no easy path to transition without a feature regression.
> > >
> > > Can you elaborate?  What features are missing from the drm gpu scheduler?
> >
> > Priority based pre-emption is the big thing coming to mind. But maybe we
> > should not derail the discussion in this thread. The discussion should
> > be in the archives, or we can start a new thread.
>
> Probably not worth continuing here, but I think there are probably
> features that the drm scheduler has that the i915 scheduler does not.
> For example, engine load balancing.  So i wouldn't necessarily say
> it's more advanced, just different feature sets.  We could all be
> enjoying all of these features if we all worked on the same
> infrastructure.
>
> >
> > > > We'd have to rewrite many of the more advanced features for that codebase
> > > > before we could transition over. It's hard to justify such work, for
> > > > that it would buy us very little compared to amount of work.
> > > >
> > > > Situation would be different if there was something gained from
> > > > switching over. This would be the situation if the more advanced
> > > > scheduler was picked as the shared codebase.
> > > >
> > > > > b) however this last two paragraphs fill me with no confidence that
> > > > > you've looked at TTM at all. It sounds like you took comments about
> > > > > TTM made 10 years ago, and didn't update them. There should be no
> > > > > major reason for a uapi change just because you adopt TTM. TTM hasn't
> > > > > ever had a common uapi across drivers upstream, one was proposed
> > > > > initially > 10 years ago.
> > > >
> > > > This is one part my confusion on what the question was for and other
> > > > part bad wording on my behalf.
> > > >
> > > > So an attempt to re-answer: When this effort was started it was obvious
> > > > that the amount of new code required was low (as you can see). Feedback
> > > > about what converting to TTM would require vs. give us was gathered from
> > > > folks including Daniel and Chris and the unanimous decision was that it
> > > > would not be justifiable.
> > > >
> > > > > All the current TTM using drivers except
> > > > > vmware use a GEM based API as well. TTM is an internal driver helper
> > > > > for managing pools of RAM.
> > > > >
> > > > > I'm just not sure what rebuilding a chunk of shared code inside the
> > > > > i915 driver is buying you, except a transition path into divergence
> > > > > from all the other discrete RAM drivers.
> > > >
> > > > I guess what I'm trying to say, there isn't that much code being added
> > > > that wouldn't be there anyway. The i915 GEM code has already grown to
> > > > what it is before this.
> > > >
> > > > Adding the suggested smaller amount of code vs. doing a much bigger
> > > > rewrite is something of a straightforward choice with the amount of
> > > > platforms and features in flight, especially when the end result is
> > > > the same.
> > >
> > > Because you will probably never do it.  It's almost always easier to
> > > just incrementally add on to existing code.  One could argue that GEM
> > > evolved into more or less the exact same thing as TTM anyway so why
> > > not bite the bullet and switch at some point?  TTM works fine even for
> > > UMA hardware.
> >
> > By my understanding there are quite a few differences in low on memory
> > handling and other more subtle aspects between the two.
> > Converting over to TTM would be a pretty big rewrite of i915, and the
> > code sharing is already happening for important parts.
> >
> > > > > Like the gallium aversion in
> > > > > userspace, having a TTM aversion in kernel space is going to be the
> > > > > wrong path, and I'd rather not have to clean it up in 5 years when you
> > > > > guys eventually realise it.
> > > > >
> > > > > The i915 GEM code get rewritten and refactored quite often
> > > >
> > > > Well, we continually try to improve things and accommodate upcoming
> > > > product requirements and hardware feature. The thing is that we do it
> > > > upstream first, so hard to avoid the churn.
> > > >
> > > > When moving rapidly, it's hard to project what the shared portion of
> > > > the code might be or exactly look. I'm myself much more believer in
> > > > extracting the generic portions out when the code is there and working.
> > > >
> > > > The churn would be even bigger if there is a strict requirement to
> > > > refactor the common parts out straight from the beginning, before
> > > > anything is even proven in practice.
> > > >
> > > > That'd just mean it gets much harder to sell doing things upstream
> > > > first to the management.
> > >
> > > What does upstream first have to do with contributing to shared code?
> > > There is a common misconception in big companies that if you utilize
> > > shared infrastructure it will slow you down or you'll lose control of
> > > your code which is I think what you are referring to.
> >
> > That'd be the wrong impression. Quite the contrary, the biggest chunk
> > of code here, buddy allocator, was written specifically with the fact
> > in mind that we can easily move it to DRM core.
> >
> > And I think we have a solid track history of contributing to the DRM
> > core, so I'm not sure why you'd think that.
>
> I didn't say you don't have a track record of contributing, my point
> was that you only seem to use what you contributed in the first place.
> This is another example.  I haven't seen Intel use any infrastructure
> they didn't write directly.  Off the top of my head:
> - drm GPU scheduler
> - TTM
> - Gallium
> - Multi-media (creating VAAPI when there were arguably superior
> alternatives, VDPAU and OpenMAX)
> - OpenCL
>
> In all those cases Intel did something equivalent in parallel, pushing
> its projects while trying to downplay the existing projects.  This
> just seems like more of that.
>
> >
> > > Ultimately, it
> > > does sometimes raise the bar, but in the long term it benefits
> > > everyone and usually it doesn't really add that much overhead.
> >
> > That we both agree on. But in the case of this specific patch series
> > it is about rewriting the driver to TTM, which is quite an overhead.
> >
> > And if most of the important code is shared anyway through DRM core,
> > doing the rewrite + covering all the subtle differences, does not
> > sound too compelling.
>
> Sometimes it is a lot of overhead.  That's the pact you make when you
> put code upstream.  The community wants the best solution for everyone
> and to be able to maintain it if you don't.  Sometimes that means
> extra overhead.
>
> At the end of the day, I don't really care that much.  I get it, we
> all have large projects with scarce resources.  I just think a few
> years down the road we'll all regret it as a community.

AMD and others have also spent years tuning TTM for both UMA and VRAM,
but especially VRAM.  It comes across a bit daft to complain about the
effort to move to TTM, but say nothing about the effort to tune GEM
for optimal VRAM performance.  Effort that has already been expended
that you could take advantage of.

Alex

Dave Airlie Feb. 26, 2019, 11:04 p.m. UTC | #11

> > At the end of the day, I don't really care that much.  I get it, we
> > all have large projects with scarce resources.  I just think a few
> > years down the road we'll all regret it as a community.
>
> AMD and others have also spent years tuning TTM for both UMA and VRAM,
> but especially VRAM.  It comes across a bit daft to complain about the
> effort to move to TTM, but say nothing about the effort to tune GEM
> for optimal VRAM performance.  Effort that has already been expended
> that you could take advantage of.

I'm with Alex here, the patches you have now are just the start, I
realise you think they are all you'll need, but I expect once Chris
gets going with real VRAM hardware he'll generate reams for stuff.

People have been tuning and making TTM run on VRAM using GPUs for
longer than you've been making VRAM using GPUs, there had better be
good and well thought out reasons for avoiding using it, and so far
you haven't made that argument to me all. In fact your scheduler
arguments works against you. If we should have abstracted i915
scheduler out and used it because it had more features and
pre-existed, then i915 should be using TTM since it's already
abstracted out and has more features.

Like we've pulled other stuff out of TTM like reservation objects, I
don't think i915 uses those yet either when it clearly could be. Maybe
if we started by fixing that, moving to TTM would be less of a
problem.

Anyways, I expect moving to TTM is a big change for i915, and probably
more than you are able to bite off at present, but I'm going to be
watching closely what stuff you add on top of this sort of thing, and
if it starts getting large and messier as you tune it, I'll have to
start reconsidering how big a NO I have to use.

Dave.

Christian König Feb. 27, 2019, 12:17 p.m. UTC | #12

Am 27.02.19 um 00:04 schrieb Dave Airlie:
>>> At the end of the day, I don't really care that much.  I get it, we
>>> all have large projects with scarce resources.  I just think a few
>>> years down the road we'll all regret it as a community.
>> AMD and others have also spent years tuning TTM for both UMA and VRAM,
>> but especially VRAM.  It comes across a bit daft to complain about the
>> effort to move to TTM, but say nothing about the effort to tune GEM
>> for optimal VRAM performance.  Effort that has already been expended
>> that you could take advantage of.
> I'm with Alex here, the patches you have now are just the start, I
> realise you think they are all you'll need, but I expect once Chris
> gets going with real VRAM hardware he'll generate reams for stuff.
>
> People have been tuning and making TTM run on VRAM using GPUs for
> longer than you've been making VRAM using GPUs, there had better be
> good and well thought out reasons for avoiding using it, and so far
> you haven't made that argument to me all. In fact your scheduler
> arguments works against you. If we should have abstracted i915
> scheduler out and used it because it had more features and
> pre-existed, then i915 should be using TTM since it's already
> abstracted out and has more features.
>
> Like we've pulled other stuff out of TTM like reservation objects, I
> don't think i915 uses those yet either when it clearly could be. Maybe
> if we started by fixing that, moving to TTM would be less of a
> problem.

Just to make it extra clear: At least I absolutely won't mind if we 
decommission TTM further!

We have optimized TTM as much as we could without a fundamental design 
change, but essentially there are a couple of problem we can't fix 
without touching all drivers at once.

For example the layered design where TTM calls back into the driver to 
move stuff around or allocate something from a domain really needs to go 
away.

So my suggestion is that we filleting TTM into multiple independent 
components which a) can be used to implement the existing TTM interface 
and b) implement a clean and encapsulated functionality.

Those components can then be used by drivers independently of TTM to 
implement the necessary MM.

Regards,
Christian.

>
> Anyways, I expect moving to TTM is a big change for i915, and probably
> more than you are able to bite off at present, but I'm going to be
> watching closely what stuff you add on top of this sort of thing, and
> if it starts getting large and messier as you tune it, I'll have to
> start reconsidering how big a NO I have to use.
>
> Dave.
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

Joonas Lahtinen Feb. 27, 2019, 2:40 p.m. UTC | #13

Quoting Christian König (2019-02-27 04:17:01)
> Am 27.02.19 um 00:04 schrieb Dave Airlie:
> >>> At the end of the day, I don't really care that much.  I get it, we
> >>> all have large projects with scarce resources.  I just think a few
> >>> years down the road we'll all regret it as a community.
> >> AMD and others have also spent years tuning TTM for both UMA and VRAM,
> >> but especially VRAM.  It comes across a bit daft to complain about the
> >> effort to move to TTM, but say nothing about the effort to tune GEM
> >> for optimal VRAM performance.  Effort that has already been expended
> >> that you could take advantage of.
> > I'm with Alex here, the patches you have now are just the start, I
> > realise you think they are all you'll need, but I expect once Chris
> > gets going with real VRAM hardware he'll generate reams for stuff.
> >
> > People have been tuning and making TTM run on VRAM using GPUs for
> > longer than you've been making VRAM using GPUs, there had better be
> > good and well thought out reasons for avoiding using it, and so far
> > you haven't made that argument to me all. In fact your scheduler
> > arguments works against you. If we should have abstracted i915
> > scheduler out and used it because it had more features and
> > pre-existed, then i915 should be using TTM since it's already
> > abstracted out and has more features.
> >
> > Like we've pulled other stuff out of TTM like reservation objects, I
> > don't think i915 uses those yet either when it clearly could be. Maybe
> > if we started by fixing that, moving to TTM would be less of a
> > problem.
> 
> Just to make it extra clear: At least I absolutely won't mind if we 
> decommission TTM further!
> 
> We have optimized TTM as much as we could without a fundamental design 
> change, but essentially there are a couple of problem we can't fix 
> without touching all drivers at once.
> 
> For example the layered design where TTM calls back into the driver to 
> move stuff around or allocate something from a domain really needs to go 
> away.
> 
> So my suggestion is that we filleting TTM into multiple independent 
> components which a) can be used to implement the existing TTM interface 
> and b) implement a clean and encapsulated functionality.
> 
> Those components can then be used by drivers independently of TTM to 
> implement the necessary MM.

This is exactly what I was hoping we could do, too. So I'm glad to hear
this suggestion. 

Incrementally extracting and sharing the components would lead to less
disruptions than halting the progress while doing a major rewrite of
the driver.

As I mentioned in IRC, one good step for both the scheduler and memory
management would be to actually map out the feature sets. It is clear
that we have confusion about what the feature set of the respective
components are (at least I do seem to have about TTM / DRM scheduler).

Then when we understand what it is that we actually have, it should be
easier to start discussing which are the components that could be reused.

I think one good way to take on this would be to look into sharing as
much of the test assets as possible. We have plenty of testcases
excercising the low-on-memory conditions and scheduling corner cases
that we've had to tackle. And I'm sure there are tests for the above
mentioned TTM VRAM tuning, too.

I'll look into and discuss the reservation objects Dave mentioned, and
I'm happy to hear about other suggestions.

I'd also like to hear comments about the buddy allocator suggestion. It
should help in enabling >4K page support for pretty much any driver.

Regards, Joonas

> Regards,
> Christian.
> 
> >
> > Anyways, I expect moving to TTM is a big change for i915, and probably
> > more than you are able to bite off at present, but I'm going to be
> > watching closely what stuff you add on top of this sort of thing, and
> > if it starts getting large and messier as you tune it, I'll have to
> > start reconsidering how big a NO I have to use.
> >
> > Dave.
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

[RFC,00/42] Introduce memory region concept (including device local memory)

Message

Comments