Message ID | 20190408152702.4153-2-ville.syrjala@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] drm/i915: Use mul_u32_u32() more | expand |
Quoting Ville Syrjala (2019-04-08 16:27:02) > - /* > - * Adjust the original formula to delay the division by 2^22 in order to > - * minimize possible rounding errors. > - */ > - tmp = (u64)m1 * m2_int * ref_clock + > - (((u64)m1 * m2_frac * ref_clock) >> 22); > - tmp = div_u64(tmp, 5 * div1 * div2); > - > - return tmp; > + return div_u64(mul_u32_u32(ref_clock * m1, m2), > + (5 * div1 * div2) << 22); You say the denominator here is a u64, so do you not need to cast (u64)(5 * d1 * d2) to ensure it doesn't overflow the shift? Aiui d1, d2 are u32 here. -Chris
On Mon, Apr 08, 2019 at 04:49:13PM +0100, Chris Wilson wrote: > Quoting Ville Syrjala (2019-04-08 16:27:02) > > - /* > > - * Adjust the original formula to delay the division by 2^22 in order to > > - * minimize possible rounding errors. > > - */ > > - tmp = (u64)m1 * m2_int * ref_clock + > > - (((u64)m1 * m2_frac * ref_clock) >> 22); > > - tmp = div_u64(tmp, 5 * div1 * div2); > > - > > - return tmp; > > + return div_u64(mul_u32_u32(ref_clock * m1, m2), > > + (5 * div1 * div2) << 22); > > You say the denominator here is a u64, so do you not need to cast > (u64)(5 * d1 * d2) to ensure it doesn't overflow the shift? It should fit into u32. The maximum value should be <= (5*0xf*0x7)<<22 based on the number of bits available for div1/2.
Quoting Ville Syrjälä (2019-04-08 17:06:01) > On Mon, Apr 08, 2019 at 04:49:13PM +0100, Chris Wilson wrote: > > Quoting Ville Syrjala (2019-04-08 16:27:02) > > > - /* > > > - * Adjust the original formula to delay the division by 2^22 in order to > > > - * minimize possible rounding errors. > > > - */ > > > - tmp = (u64)m1 * m2_int * ref_clock + > > > - (((u64)m1 * m2_frac * ref_clock) >> 22); > > > - tmp = div_u64(tmp, 5 * div1 * div2); > > > - > > > - return tmp; > > > + return div_u64(mul_u32_u32(ref_clock * m1, m2), > > > + (5 * div1 * div2) << 22); > > > > You say the denominator here is a u64, so do you not need to cast > > (u64)(5 * d1 * d2) to ensure it doesn't overflow the shift? > > It should fit into u32. The maximum value should be > <= (5*0xf*0x7)<<22 based on the number of bits available > for div1/2. No worries, I was thinking div_u64 == div_u64_u64 anyway. -Chris
Quoting Ville Syrjälä (2019-04-08 17:06:01) > On Mon, Apr 08, 2019 at 04:49:13PM +0100, Chris Wilson wrote: > > Quoting Ville Syrjala (2019-04-08 16:27:02) > > > - /* > > > - * Adjust the original formula to delay the division by 2^22 in order to > > > - * minimize possible rounding errors. > > > - */ > > > - tmp = (u64)m1 * m2_int * ref_clock + > > > - (((u64)m1 * m2_frac * ref_clock) >> 22); > > > - tmp = div_u64(tmp, 5 * div1 * div2); > > > - > > > - return tmp; > > > + return div_u64(mul_u32_u32(ref_clock * m1, m2), > > > + (5 * div1 * div2) << 22); > > > > You say the denominator here is a u64, so do you not need to cast > > (u64)(5 * d1 * d2) to ensure it doesn't overflow the shift? > > It should fit into u32. The maximum value should be > <= (5*0xf*0x7)<<22 based on the number of bits available 3b * 4b * 3b = 10b. So just fits. Is it worth asserting those limits? Feels like it is running close, and will be subject to cargo-culting. -Chris
On Mon, Apr 08, 2019 at 06:05:06PM +0100, Chris Wilson wrote: > Quoting Ville Syrjälä (2019-04-08 17:06:01) > > On Mon, Apr 08, 2019 at 04:49:13PM +0100, Chris Wilson wrote: > > > Quoting Ville Syrjala (2019-04-08 16:27:02) > > > > - /* > > > > - * Adjust the original formula to delay the division by 2^22 in order to > > > > - * minimize possible rounding errors. > > > > - */ > > > > - tmp = (u64)m1 * m2_int * ref_clock + > > > > - (((u64)m1 * m2_frac * ref_clock) >> 22); > > > > - tmp = div_u64(tmp, 5 * div1 * div2); > > > > - > > > > - return tmp; > > > > + return div_u64(mul_u32_u32(ref_clock * m1, m2), > > > > + (5 * div1 * div2) << 22); > > > > > > You say the denominator here is a u64, so do you not need to cast > > > (u64)(5 * d1 * d2) to ensure it doesn't overflow the shift? > > > > It should fit into u32. The maximum value should be > > <= (5*0xf*0x7)<<22 based on the number of bits available > 3b * 4b * 3b = 10b. So just fits. > > Is it worth asserting those limits? Feels like it is running close, and > will be subject to cargo-culting. I suppose checking for it might be a good idea. Just 'WARN_ON(5 * div1 * div2 >= 1 << 10)' maybe, or were you thinking of something fancier?
Quoting Ville Syrjälä (2019-04-10 18:59:52) > On Mon, Apr 08, 2019 at 06:05:06PM +0100, Chris Wilson wrote: > > Quoting Ville Syrjälä (2019-04-08 17:06:01) > > > On Mon, Apr 08, 2019 at 04:49:13PM +0100, Chris Wilson wrote: > > > > Quoting Ville Syrjala (2019-04-08 16:27:02) > > > > > - /* > > > > > - * Adjust the original formula to delay the division by 2^22 in order to > > > > > - * minimize possible rounding errors. > > > > > - */ > > > > > - tmp = (u64)m1 * m2_int * ref_clock + > > > > > - (((u64)m1 * m2_frac * ref_clock) >> 22); > > > > > - tmp = div_u64(tmp, 5 * div1 * div2); > > > > > - > > > > > - return tmp; > > > > > + return div_u64(mul_u32_u32(ref_clock * m1, m2), > > > > > + (5 * div1 * div2) << 22); > > > > > > > > You say the denominator here is a u64, so do you not need to cast > > > > (u64)(5 * d1 * d2) to ensure it doesn't overflow the shift? > > > > > > It should fit into u32. The maximum value should be > > > <= (5*0xf*0x7)<<22 based on the number of bits available > > 3b * 4b * 3b = 10b. So just fits. > > > > Is it worth asserting those limits? Feels like it is running close, and > > will be subject to cargo-culting. > > I suppose checking for it might be a good idea. > > Just 'WARN_ON(5 * div1 * div2 >= 1 << 10)' maybe, or were you thinking > of something fancier? How about something like struct { unsigned int div1 : 3; unsigned int div2 : 3; } d; then with a bit of luck smatch will spot an overflow, and people might think twice when copying? Even weirder, add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-29 (-29) Function old new delta intel_ddi_get_config 2377 2348 -29 I dread to look into the function to see how that changed gcc's mind. -Chris
On Wed, Apr 10, 2019 at 07:13:48PM +0100, Chris Wilson wrote: > Quoting Ville Syrjälä (2019-04-10 18:59:52) > > On Mon, Apr 08, 2019 at 06:05:06PM +0100, Chris Wilson wrote: > > > Quoting Ville Syrjälä (2019-04-08 17:06:01) > > > > On Mon, Apr 08, 2019 at 04:49:13PM +0100, Chris Wilson wrote: > > > > > Quoting Ville Syrjala (2019-04-08 16:27:02) > > > > > > - /* > > > > > > - * Adjust the original formula to delay the division by 2^22 in order to > > > > > > - * minimize possible rounding errors. > > > > > > - */ > > > > > > - tmp = (u64)m1 * m2_int * ref_clock + > > > > > > - (((u64)m1 * m2_frac * ref_clock) >> 22); > > > > > > - tmp = div_u64(tmp, 5 * div1 * div2); > > > > > > - > > > > > > - return tmp; > > > > > > + return div_u64(mul_u32_u32(ref_clock * m1, m2), > > > > > > + (5 * div1 * div2) << 22); > > > > > > > > > > You say the denominator here is a u64, so do you not need to cast > > > > > (u64)(5 * d1 * d2) to ensure it doesn't overflow the shift? > > > > > > > > It should fit into u32. The maximum value should be > > > > <= (5*0xf*0x7)<<22 based on the number of bits available > > > 3b * 4b * 3b = 10b. So just fits. > > > > > > Is it worth asserting those limits? Feels like it is running close, and > > > will be subject to cargo-culting. > > > > I suppose checking for it might be a good idea. > > > > Just 'WARN_ON(5 * div1 * div2 >= 1 << 10)' maybe, or were you thinking > > of something fancier? > > How about something like > struct { > unsigned int div1 : 3; > unsigned int div2 : 3; > } d; > > then with a bit of luck smatch will spot an overflow, and people might > think twice when copying? > > Even weirder, > add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-29 (-29) > Function old new delta > intel_ddi_get_config 2377 2348 -29 > > I dread to look into the function to see how that changed gcc's mind. I get 48 bytes with a 32bit build, but only if I let it inline that function. With noinline there is no change apart from a few registers getting shuffled around. gcc works in mysterious ways.
diff --git a/drivers/gpu/drm/i915/intel_ddi.c b/drivers/gpu/drm/i915/intel_ddi.c index 720de7a3f5e6..0a24608ce49b 100644 --- a/drivers/gpu/drm/i915/intel_ddi.c +++ b/drivers/gpu/drm/i915/intel_ddi.c @@ -1386,8 +1386,7 @@ static int icl_calc_tbt_pll_link(struct drm_i915_private *dev_priv, static int icl_calc_mg_pll_link(struct drm_i915_private *dev_priv, const struct intel_dpll_hw_state *pll_state) { - u32 m1, m2_int, m2_frac, div1, div2, ref_clock; - u64 tmp; + u32 m1, m2, m2_int, m2_frac, div1, div2, ref_clock; ref_clock = dev_priv->cdclk.hw.ref; @@ -1396,6 +1395,7 @@ static int icl_calc_mg_pll_link(struct drm_i915_private *dev_priv, m2_frac = (pll_state->mg_pll_div0 & MG_PLL_DIV0_FRACNEN_H) ? (pll_state->mg_pll_div0 & MG_PLL_DIV0_FBDIV_FRAC_MASK) >> MG_PLL_DIV0_FBDIV_FRAC_SHIFT : 0; + m2 = (m2_int << 22) | m2_frac; switch (pll_state->mg_clktop2_hsclkctl & MG_CLKTOP2_HSCLKCTL_HSDIV_RATIO_MASK) { @@ -1424,15 +1424,8 @@ static int icl_calc_mg_pll_link(struct drm_i915_private *dev_priv, if (div2 == 0) div2 = 1; - /* - * Adjust the original formula to delay the division by 2^22 in order to - * minimize possible rounding errors. - */ - tmp = (u64)m1 * m2_int * ref_clock + - (((u64)m1 * m2_frac * ref_clock) >> 22); - tmp = div_u64(tmp, 5 * div1 * div2); - - return tmp; + return div_u64(mul_u32_u32(ref_clock * m1, m2), + (5 * div1 * div2) << 22); } static void ddi_dotclock_get(struct intel_crtc_state *pipe_config) diff --git a/drivers/gpu/drm/i915/intel_dpll_mgr.c b/drivers/gpu/drm/i915/intel_dpll_mgr.c index 29edc369920b..3098c783a615 100644 --- a/drivers/gpu/drm/i915/intel_dpll_mgr.c +++ b/drivers/gpu/drm/i915/intel_dpll_mgr.c @@ -2682,7 +2682,7 @@ static bool icl_calc_mg_pll_state(struct intel_crtc_state *crtc_state) } m2div_rem = dco_khz % (refclk_khz * m1div); - tmp = (u64)m2div_rem * (1 << 22); + tmp = (u64)m2div_rem << 22; do_div(tmp, refclk_khz * m1div); m2div_frac = tmp;