mbox series

[0/3] Support 64 bpp half float formats

Message ID 1543473493-30973-1-git-send-email-kevin.strasser@intel.com (mailing list archive)
Headers show
Series Support 64 bpp half float formats | expand

Message

Kevin Strasser Nov. 29, 2018, 6:38 a.m. UTC
This series defines new formats and adds a plane property to be used for
floating point framebuffer content. Implementation is then added to i915.

I have shared an IGT branch which adds test coverage for the new formats:
  https://github.com/strassek/xorg-intel-gpu-tools/tree/fp16

Kevin Strasser (3):
  drm/fourcc: Add 64 bpp half float formats
  drm: Add optional PIXEL_NORMALIZE_RANGE property to drm_plane
  drm/i915: Implement half float formats and pixel normalize property

 drivers/gpu/drm/drm_atomic.c         |  2 +
 drivers/gpu/drm/drm_atomic_uapi.c    |  4 ++
 drivers/gpu/drm/drm_color_mgmt.c     | 67 +++++++++++++++++++++++
 drivers/gpu/drm/drm_crtc_internal.h  |  1 +
 drivers/gpu/drm/drm_fourcc.c         |  4 ++
 drivers/gpu/drm/i915/i915_reg.h      | 15 ++++-
 drivers/gpu/drm/i915/intel_display.c | 47 ++++++++++++++++
 drivers/gpu/drm/i915/intel_drv.h     |  5 ++
 drivers/gpu/drm/i915/intel_sprite.c  | 82 ++++++++++++++++++++++++++--
 include/drm/drm_color_mgmt.h         |  9 +++
 include/drm/drm_fourcc.h             |  3 +
 include/drm/drm_plane.h              | 14 +++++
 include/uapi/drm/drm_fourcc.h        |  6 ++
 13 files changed, 252 insertions(+), 7 deletions(-)

Comments

Ville Syrjälä Nov. 29, 2018, 7:26 p.m. UTC | #1
On Wed, Nov 28, 2018 at 10:38:10PM -0800, Kevin Strasser wrote:
> This series defines new formats and adds a plane property to be used for
> floating point framebuffer content. Implementation is then added to i915.
> 
> I have shared an IGT branch which adds test coverage for the new formats:
>   https://github.com/strassek/xorg-intel-gpu-tools/tree/fp16

Looks about similar as what I had written. I wrote my half<->full
conversion thing from scratch which probably means it has more rounding
errors and whatnot. The speed of mine wasn't exactly stellar and looks
like your version probably has the same issue. So I was actually
thinking of using the sse<something> instructions meant for this
could provide a nice speedup. I guess we might want the pure c version
as a backup though. Hmm. Now I also seem to recall that I noticed
there being a compiler intrinsic even for single value half<->full
precision conversion. Did you look into using that (if I didn't imagine
it)?

BTW I just rebased my fp16 for pre-icl platforms:
git://github.com/vsyrjala/linux.git fp16_scanout_2

Apart from the ivb/hsw w/a there isn't all that much unexpected
when it comes to fp16 on those platforms either.

> 
> Kevin Strasser (3):
>   drm/fourcc: Add 64 bpp half float formats
>   drm: Add optional PIXEL_NORMALIZE_RANGE property to drm_plane
>   drm/i915: Implement half float formats and pixel normalize property
> 
>  drivers/gpu/drm/drm_atomic.c         |  2 +
>  drivers/gpu/drm/drm_atomic_uapi.c    |  4 ++
>  drivers/gpu/drm/drm_color_mgmt.c     | 67 +++++++++++++++++++++++
>  drivers/gpu/drm/drm_crtc_internal.h  |  1 +
>  drivers/gpu/drm/drm_fourcc.c         |  4 ++
>  drivers/gpu/drm/i915/i915_reg.h      | 15 ++++-
>  drivers/gpu/drm/i915/intel_display.c | 47 ++++++++++++++++
>  drivers/gpu/drm/i915/intel_drv.h     |  5 ++
>  drivers/gpu/drm/i915/intel_sprite.c  | 82 ++++++++++++++++++++++++++--
>  include/drm/drm_color_mgmt.h         |  9 +++
>  include/drm/drm_fourcc.h             |  3 +
>  include/drm/drm_plane.h              | 14 +++++
>  include/uapi/drm/drm_fourcc.h        |  6 ++
>  13 files changed, 252 insertions(+), 7 deletions(-)
> 
> -- 
> 2.17.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Kevin Strasser Nov. 29, 2018, 9:39 p.m. UTC | #2
Ville Syrjälä wrote:
> On Wed, Nov 28, 2018 at 10:38:10PM -0800, Kevin Strasser wrote:
>> This series defines new formats and adds a plane property to be used for
>> floating point framebuffer content. Implementation is then added to i915.
>>
>> I have shared an IGT branch which adds test coverage for the new formats:
>>   https://github.com/strassek/xorg-intel-gpu-tools/tree/fp16
>
> Looks about similar as what I had written. I wrote my half<->full
> conversion thing from scratch which probably means it has more rounding
> errors and whatnot. The speed of mine wasn't exactly stellar and looks
> like your version probably has the same issue. So I was actually
> thinking of using the sse<something> instructions meant for this
> could provide a nice speedup. I guess we might want the pure c version
> as a backup though. Hmm. Now I also seem to recall that I noticed
> there being a compiler intrinsic even for single value half<->full
> precision conversion. Did you look into using that (if I didn't imagine
> it)?

You are thinking of vcvtps2ph and vcvtph2ps, I haven't yet had a chance to 
give them a try, but I agree it seems like a good idea.

> BTW I just rebased my fp16 for pre-icl platforms:
> git://github.com/vsyrjala/linux.git fp16_scanout_2
>
> Apart from the ivb/hsw w/a there isn't all that much unexpected
> when it comes to fp16 on those platforms either.

I don't mean to step on your toes with this series, were you waiting for /  
working on a real usecase before pushing that code?

Thanks,
Kevin
Ville Syrjälä Nov. 30, 2018, 2:15 p.m. UTC | #3
On Thu, Nov 29, 2018 at 09:39:52PM +0000, Strasser, Kevin wrote:
> Ville Syrjälä wrote:
> > On Wed, Nov 28, 2018 at 10:38:10PM -0800, Kevin Strasser wrote:
> >> This series defines new formats and adds a plane property to be used for
> >> floating point framebuffer content. Implementation is then added to i915.
> >>
> >> I have shared an IGT branch which adds test coverage for the new formats:
> >>   https://github.com/strassek/xorg-intel-gpu-tools/tree/fp16
> >
> > Looks about similar as what I had written. I wrote my half<->full
> > conversion thing from scratch which probably means it has more rounding
> > errors and whatnot. The speed of mine wasn't exactly stellar and looks
> > like your version probably has the same issue. So I was actually
> > thinking of using the sse<something> instructions meant for this
> > could provide a nice speedup. I guess we might want the pure c version
> > as a backup though. Hmm. Now I also seem to recall that I noticed
> > there being a compiler intrinsic even for single value half<->full
> > precision conversion. Did you look into using that (if I didn't imagine
> > it)?
> 
> You are thinking of vcvtps2ph and vcvtph2ps, I haven't yet had a chance to 
> give them a try, but I agree it seems like a good idea.
> 
> > BTW I just rebased my fp16 for pre-icl platforms:
> > git://github.com/vsyrjala/linux.git fp16_scanout_2
> >
> > Apart from the ivb/hsw w/a there isn't all that much unexpected
> > when it comes to fp16 on those platforms either.
> 
> I don't mean to step on your toes with this series, were you waiting for /  
> working on a real usecase before pushing that code?

I pretty much just did it so that I could test >10bpc gamma LUTs. But
I got sidetracked by other things so I didn't really get even that far.
Also another problem is that igt depends on cairo which didn't support
rendering at >10bpc, so I couldn't really test that stuff properly even
if I wanted to. Maarten has patches to wire up floats into cairo but I
think he just said that it still kinda uses 8bpc precision only :(

Anyways, the fact that you did icl and I did pre-icl is pretty good
division of labour. Sometimes things work out by accident :)