Message ID | 20140420202933.78b3e355.cand@gmx.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Am 20.04.2014 19:29, schrieb Lauri Kasanen: > This was originally un-inlined by Andi Kleen in 2011 citing size concerns. > Indeed, a first attempt at inlining it grew radeon.ko by 7%. > > However, 2% of cpu is spent in this function. Simply inlining it gave 1% more fps > in Urban Terror. > > v2: We know the minimum MMIO size. Adding it to the if allows the compiler to > optimize the branch out, improving both performance and size. > > The v2 patch decreases radeon.ko size by 2%. I didn't re-benchmark, but common sense > says perf is now more than 1% better. > > v3: Also change _wreg, make the threshold a define. > > Inlining _wreg increased the size a bit compared to v2, so now radeon.ko > is only 1% smaller. > > Signed-off-by: Lauri Kasanen <cand@gmx.com> Reviewed-by: Christian König <christian.koenig@amd.com> > --- > drivers/gpu/drm/radeon/r100.c | 33 --------------------------------- > drivers/gpu/drm/radeon/radeon.h | 40 ++++++++++++++++++++++++++++++++++++---- > 2 files changed, 36 insertions(+), 37 deletions(-) > > diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c > index b6c3264..a4e7871 100644 > --- a/drivers/gpu/drm/radeon/r100.c > +++ b/drivers/gpu/drm/radeon/r100.c > @@ -4086,39 +4086,6 @@ int r100_init(struct radeon_device *rdev) > return 0; > } > > -uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg, > - bool always_indirect) > -{ > - if (reg < rdev->rmmio_size && !always_indirect) > - return readl(((void __iomem *)rdev->rmmio) + reg); > - else { > - unsigned long flags; > - uint32_t ret; > - > - spin_lock_irqsave(&rdev->mmio_idx_lock, flags); > - writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX); > - ret = readl(((void __iomem *)rdev->rmmio) + RADEON_MM_DATA); > - spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags); > - > - return ret; > - } > -} > - > -void r100_mm_wreg(struct radeon_device *rdev, uint32_t reg, uint32_t v, > - bool always_indirect) > -{ > - if (reg < rdev->rmmio_size && !always_indirect) > - writel(v, ((void __iomem *)rdev->rmmio) + reg); > - else { > - unsigned long flags; > - > - spin_lock_irqsave(&rdev->mmio_idx_lock, flags); > - writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX); > - writel(v, ((void __iomem *)rdev->rmmio) + RADEON_MM_DATA); > - spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags); > - } > -} > - > u32 r100_io_rreg(struct radeon_device *rdev, u32 reg) > { > if (reg < rdev->rio_mem_size) > diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h > index f21db7a..a749b6c 100644 > --- a/drivers/gpu/drm/radeon/radeon.h > +++ b/drivers/gpu/drm/radeon/radeon.h > @@ -2328,10 +2328,42 @@ int radeon_device_init(struct radeon_device *rdev, > void radeon_device_fini(struct radeon_device *rdev); > int radeon_gpu_wait_for_idle(struct radeon_device *rdev); > > -uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg, > - bool always_indirect); > -void r100_mm_wreg(struct radeon_device *rdev, uint32_t reg, uint32_t v, > - bool always_indirect); > +#define RADEON_MIN_MMIO_SIZE 0x10000 > + > +static inline uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg, > + bool always_indirect) > +{ > + /* The mmio size is 64kb at minimum. Allows the if to be optimized out. */ > + if ((reg < rdev->rmmio_size || reg < RADEON_MIN_MMIO_SIZE) && !always_indirect) > + return readl(((void __iomem *)rdev->rmmio) + reg); > + else { > + unsigned long flags; > + uint32_t ret; > + > + spin_lock_irqsave(&rdev->mmio_idx_lock, flags); > + writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX); > + ret = readl(((void __iomem *)rdev->rmmio) + RADEON_MM_DATA); > + spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags); > + > + return ret; > + } > +} > + > +static inline void r100_mm_wreg(struct radeon_device *rdev, uint32_t reg, uint32_t v, > + bool always_indirect) > +{ > + if ((reg < rdev->rmmio_size || reg < RADEON_MIN_MMIO_SIZE) && !always_indirect) > + writel(v, ((void __iomem *)rdev->rmmio) + reg); > + else { > + unsigned long flags; > + > + spin_lock_irqsave(&rdev->mmio_idx_lock, flags); > + writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX); > + writel(v, ((void __iomem *)rdev->rmmio) + RADEON_MM_DATA); > + spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags); > + } > +} > + > u32 r100_io_rreg(struct radeon_device *rdev, u32 reg); > void r100_io_wreg(struct radeon_device *rdev, u32 reg, u32 v); >
On Sun, 20 Apr 2014 19:41:11 +0200 Christian König <deathsimple@vodafone.de> wrote: > Am 20.04.2014 19:29, schrieb Lauri Kasanen: > > This was originally un-inlined by Andi Kleen in 2011 citing size concerns. > > Indeed, a first attempt at inlining it grew radeon.ko by 7%. > > > > However, 2% of cpu is spent in this function. Simply inlining it gave 1% more fps > > in Urban Terror. > > > > v2: We know the minimum MMIO size. Adding it to the if allows the compiler to > > optimize the branch out, improving both performance and size. > > > > The v2 patch decreases radeon.ko size by 2%. I didn't re-benchmark, but common sense > > says perf is now more than 1% better. > > > > v3: Also change _wreg, make the threshold a define. > > > > Inlining _wreg increased the size a bit compared to v2, so now radeon.ko > > is only 1% smaller. > > > > Signed-off-by: Lauri Kasanen <cand@gmx.com> > > Reviewed-by: Christian König <christian.koenig@amd.com> Ping. Although reviewed in April, seems this wasn't applied to any tree? - Lauri
Am 10.07.2014 10:48, schrieb Lauri Kasanen: > On Sun, 20 Apr 2014 19:41:11 +0200 > Christian König <deathsimple@vodafone.de> wrote: > >> Am 20.04.2014 19:29, schrieb Lauri Kasanen: >>> This was originally un-inlined by Andi Kleen in 2011 citing size concerns. >>> Indeed, a first attempt at inlining it grew radeon.ko by 7%. >>> >>> However, 2% of cpu is spent in this function. Simply inlining it gave 1% more fps >>> in Urban Terror. >>> >>> v2: We know the minimum MMIO size. Adding it to the if allows the compiler to >>> optimize the branch out, improving both performance and size. >>> >>> The v2 patch decreases radeon.ko size by 2%. I didn't re-benchmark, but common sense >>> says perf is now more than 1% better. >>> >>> v3: Also change _wreg, make the threshold a define. >>> >>> Inlining _wreg increased the size a bit compared to v2, so now radeon.ko >>> is only 1% smaller. >>> >>> Signed-off-by: Lauri Kasanen <cand@gmx.com> >> Reviewed-by: Christian König <christian.koenig@amd.com> > Ping. Although reviewed in April, seems this wasn't applied to any tree? Sorry looks like I missed it. Alex can you pull that in your 3.17 branch? Thanks, Christian. > > - Lauri
On Thu, Jul 10, 2014 at 4:55 AM, Christian König <deathsimple@vodafone.de> wrote: > Am 10.07.2014 10:48, schrieb Lauri Kasanen: > >> On Sun, 20 Apr 2014 19:41:11 +0200 >> Christian König <deathsimple@vodafone.de> wrote: >> >>> Am 20.04.2014 19:29, schrieb Lauri Kasanen: >>>> >>>> This was originally un-inlined by Andi Kleen in 2011 citing size >>>> concerns. >>>> Indeed, a first attempt at inlining it grew radeon.ko by 7%. >>>> >>>> However, 2% of cpu is spent in this function. Simply inlining it gave 1% >>>> more fps >>>> in Urban Terror. >>>> >>>> v2: We know the minimum MMIO size. Adding it to the if allows the >>>> compiler to >>>> optimize the branch out, improving both performance and size. >>>> >>>> The v2 patch decreases radeon.ko size by 2%. I didn't re-benchmark, but >>>> common sense >>>> says perf is now more than 1% better. >>>> >>>> v3: Also change _wreg, make the threshold a define. >>>> >>>> Inlining _wreg increased the size a bit compared to v2, so now radeon.ko >>>> is only 1% smaller. >>>> >>>> Signed-off-by: Lauri Kasanen <cand@gmx.com> >>> >>> Reviewed-by: Christian König <christian.koenig@amd.com> >> >> Ping. Although reviewed in April, seems this wasn't applied to any tree? > > > Sorry looks like I missed it. Alex can you pull that in your 3.17 branch? Applied. thanks. Alex
diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c index b6c3264..a4e7871 100644 --- a/drivers/gpu/drm/radeon/r100.c +++ b/drivers/gpu/drm/radeon/r100.c @@ -4086,39 +4086,6 @@ int r100_init(struct radeon_device *rdev) return 0; } -uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg, - bool always_indirect) -{ - if (reg < rdev->rmmio_size && !always_indirect) - return readl(((void __iomem *)rdev->rmmio) + reg); - else { - unsigned long flags; - uint32_t ret; - - spin_lock_irqsave(&rdev->mmio_idx_lock, flags); - writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX); - ret = readl(((void __iomem *)rdev->rmmio) + RADEON_MM_DATA); - spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags); - - return ret; - } -} - -void r100_mm_wreg(struct radeon_device *rdev, uint32_t reg, uint32_t v, - bool always_indirect) -{ - if (reg < rdev->rmmio_size && !always_indirect) - writel(v, ((void __iomem *)rdev->rmmio) + reg); - else { - unsigned long flags; - - spin_lock_irqsave(&rdev->mmio_idx_lock, flags); - writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX); - writel(v, ((void __iomem *)rdev->rmmio) + RADEON_MM_DATA); - spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags); - } -} - u32 r100_io_rreg(struct radeon_device *rdev, u32 reg) { if (reg < rdev->rio_mem_size) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index f21db7a..a749b6c 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -2328,10 +2328,42 @@ int radeon_device_init(struct radeon_device *rdev, void radeon_device_fini(struct radeon_device *rdev); int radeon_gpu_wait_for_idle(struct radeon_device *rdev); -uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg, - bool always_indirect); -void r100_mm_wreg(struct radeon_device *rdev, uint32_t reg, uint32_t v, - bool always_indirect); +#define RADEON_MIN_MMIO_SIZE 0x10000 + +static inline uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg, + bool always_indirect) +{ + /* The mmio size is 64kb at minimum. Allows the if to be optimized out. */ + if ((reg < rdev->rmmio_size || reg < RADEON_MIN_MMIO_SIZE) && !always_indirect) + return readl(((void __iomem *)rdev->rmmio) + reg); + else { + unsigned long flags; + uint32_t ret; + + spin_lock_irqsave(&rdev->mmio_idx_lock, flags); + writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX); + ret = readl(((void __iomem *)rdev->rmmio) + RADEON_MM_DATA); + spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags); + + return ret; + } +} + +static inline void r100_mm_wreg(struct radeon_device *rdev, uint32_t reg, uint32_t v, + bool always_indirect) +{ + if ((reg < rdev->rmmio_size || reg < RADEON_MIN_MMIO_SIZE) && !always_indirect) + writel(v, ((void __iomem *)rdev->rmmio) + reg); + else { + unsigned long flags; + + spin_lock_irqsave(&rdev->mmio_idx_lock, flags); + writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX); + writel(v, ((void __iomem *)rdev->rmmio) + RADEON_MM_DATA); + spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags); + } +} + u32 r100_io_rreg(struct radeon_device *rdev, u32 reg); void r100_io_wreg(struct radeon_device *rdev, u32 reg, u32 v);
This was originally un-inlined by Andi Kleen in 2011 citing size concerns. Indeed, a first attempt at inlining it grew radeon.ko by 7%. However, 2% of cpu is spent in this function. Simply inlining it gave 1% more fps in Urban Terror. v2: We know the minimum MMIO size. Adding it to the if allows the compiler to optimize the branch out, improving both performance and size. The v2 patch decreases radeon.ko size by 2%. I didn't re-benchmark, but common sense says perf is now more than 1% better. v3: Also change _wreg, make the threshold a define. Inlining _wreg increased the size a bit compared to v2, so now radeon.ko is only 1% smaller. Signed-off-by: Lauri Kasanen <cand@gmx.com> --- drivers/gpu/drm/radeon/r100.c | 33 --------------------------------- drivers/gpu/drm/radeon/radeon.h | 40 ++++++++++++++++++++++++++++++++++++---- 2 files changed, 36 insertions(+), 37 deletions(-)