Message ID | 1543473493-30973-1-git-send-email-kevin.strasser@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | Support 64 bpp half float formats | expand |
On Wed, Nov 28, 2018 at 10:38:10PM -0800, Kevin Strasser wrote: > This series defines new formats and adds a plane property to be used for > floating point framebuffer content. Implementation is then added to i915. > > I have shared an IGT branch which adds test coverage for the new formats: > https://github.com/strassek/xorg-intel-gpu-tools/tree/fp16 Looks about similar as what I had written. I wrote my half<->full conversion thing from scratch which probably means it has more rounding errors and whatnot. The speed of mine wasn't exactly stellar and looks like your version probably has the same issue. So I was actually thinking of using the sse<something> instructions meant for this could provide a nice speedup. I guess we might want the pure c version as a backup though. Hmm. Now I also seem to recall that I noticed there being a compiler intrinsic even for single value half<->full precision conversion. Did you look into using that (if I didn't imagine it)? BTW I just rebased my fp16 for pre-icl platforms: git://github.com/vsyrjala/linux.git fp16_scanout_2 Apart from the ivb/hsw w/a there isn't all that much unexpected when it comes to fp16 on those platforms either. > > Kevin Strasser (3): > drm/fourcc: Add 64 bpp half float formats > drm: Add optional PIXEL_NORMALIZE_RANGE property to drm_plane > drm/i915: Implement half float formats and pixel normalize property > > drivers/gpu/drm/drm_atomic.c | 2 + > drivers/gpu/drm/drm_atomic_uapi.c | 4 ++ > drivers/gpu/drm/drm_color_mgmt.c | 67 +++++++++++++++++++++++ > drivers/gpu/drm/drm_crtc_internal.h | 1 + > drivers/gpu/drm/drm_fourcc.c | 4 ++ > drivers/gpu/drm/i915/i915_reg.h | 15 ++++- > drivers/gpu/drm/i915/intel_display.c | 47 ++++++++++++++++ > drivers/gpu/drm/i915/intel_drv.h | 5 ++ > drivers/gpu/drm/i915/intel_sprite.c | 82 ++++++++++++++++++++++++++-- > include/drm/drm_color_mgmt.h | 9 +++ > include/drm/drm_fourcc.h | 3 + > include/drm/drm_plane.h | 14 +++++ > include/uapi/drm/drm_fourcc.h | 6 ++ > 13 files changed, 252 insertions(+), 7 deletions(-) > > -- > 2.17.1 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Ville Syrjälä wrote: > On Wed, Nov 28, 2018 at 10:38:10PM -0800, Kevin Strasser wrote: >> This series defines new formats and adds a plane property to be used for >> floating point framebuffer content. Implementation is then added to i915. >> >> I have shared an IGT branch which adds test coverage for the new formats: >> https://github.com/strassek/xorg-intel-gpu-tools/tree/fp16 > > Looks about similar as what I had written. I wrote my half<->full > conversion thing from scratch which probably means it has more rounding > errors and whatnot. The speed of mine wasn't exactly stellar and looks > like your version probably has the same issue. So I was actually > thinking of using the sse<something> instructions meant for this > could provide a nice speedup. I guess we might want the pure c version > as a backup though. Hmm. Now I also seem to recall that I noticed > there being a compiler intrinsic even for single value half<->full > precision conversion. Did you look into using that (if I didn't imagine > it)? You are thinking of vcvtps2ph and vcvtph2ps, I haven't yet had a chance to give them a try, but I agree it seems like a good idea. > BTW I just rebased my fp16 for pre-icl platforms: > git://github.com/vsyrjala/linux.git fp16_scanout_2 > > Apart from the ivb/hsw w/a there isn't all that much unexpected > when it comes to fp16 on those platforms either. I don't mean to step on your toes with this series, were you waiting for / working on a real usecase before pushing that code? Thanks, Kevin
On Thu, Nov 29, 2018 at 09:39:52PM +0000, Strasser, Kevin wrote: > Ville Syrjälä wrote: > > On Wed, Nov 28, 2018 at 10:38:10PM -0800, Kevin Strasser wrote: > >> This series defines new formats and adds a plane property to be used for > >> floating point framebuffer content. Implementation is then added to i915. > >> > >> I have shared an IGT branch which adds test coverage for the new formats: > >> https://github.com/strassek/xorg-intel-gpu-tools/tree/fp16 > > > > Looks about similar as what I had written. I wrote my half<->full > > conversion thing from scratch which probably means it has more rounding > > errors and whatnot. The speed of mine wasn't exactly stellar and looks > > like your version probably has the same issue. So I was actually > > thinking of using the sse<something> instructions meant for this > > could provide a nice speedup. I guess we might want the pure c version > > as a backup though. Hmm. Now I also seem to recall that I noticed > > there being a compiler intrinsic even for single value half<->full > > precision conversion. Did you look into using that (if I didn't imagine > > it)? > > You are thinking of vcvtps2ph and vcvtph2ps, I haven't yet had a chance to > give them a try, but I agree it seems like a good idea. > > > BTW I just rebased my fp16 for pre-icl platforms: > > git://github.com/vsyrjala/linux.git fp16_scanout_2 > > > > Apart from the ivb/hsw w/a there isn't all that much unexpected > > when it comes to fp16 on those platforms either. > > I don't mean to step on your toes with this series, were you waiting for / > working on a real usecase before pushing that code? I pretty much just did it so that I could test >10bpc gamma LUTs. But I got sidetracked by other things so I didn't really get even that far. Also another problem is that igt depends on cairo which didn't support rendering at >10bpc, so I couldn't really test that stuff properly even if I wanted to. Maarten has patches to wire up floats into cairo but I think he just said that it still kinda uses 8bpc precision only :( Anyways, the fact that you did icl and I did pre-icl is pretty good division of labour. Sometimes things work out by accident :)