diff mbox

Workaround for flicker with panning on the i830

Message ID 527D024D.8020000@math.tu-berlin.de (mailing list archive)
State New, archived
Headers show

Commit Message

Thomas Richter Nov. 8, 2013, 3:25 p.m. UTC
Hi Daniel, dear intel-experts,

please find a revised patch attached that addresses the flicker with 
panning on the i830 chipset. This patch has now been tested
on various screen layouts and seems to be quite reliable, i.e. I haven't 
seen the flicker since.

Unfortunately, I have not been able to find a good workaround for the 
same problem on a tiled framebuffer, the pattern there, i.e. when
the flicker appears, is quite irregular, and it is unclear how to 
address it. The situation is even worse for 8bit/pixel framebuffers where,
for some panning positions, the display remains completely blank. It 
flickers once or twice, then gets a hick-up, and then stays off.

The patch is currently only active on the i830, and only on pipelines 
using the VGA or DVO output. Strangely enough, LVDS does not
seem to be affected, so maybe it is some memory/prefetch related 
problem. I also checked the debug output, though I found no
suspicious activity while the screen flickers or is off. For a linear 
framebuffer, it seems to be enough to position the start of the
pipeline ahead of the desired position, wait one vertical blank, and 
then adjust it to the correct position. For tiling or 8bit modes,
this does not work.

Sorry if the formatting is off. This is just what emacs left me with. 
Please feel free to reformat as you prefer.

As a related question: Is there possibly a command line tool that would 
allow me to modify the intel chipset registers on the fly without going 
through a kernel recompile? It would make some experiments just so much 
simpler.

Greetings,
     Thomas

Comments

Thomas Richter Nov. 11, 2013, 3:33 p.m. UTC | #1
Am 08.11.2013 17:32, schrieb Daniel Vetter:
> Kernel has a tool in scripts/checkpatch.pl which will tell you what's all
> off ;-) Also sob line and similar essential things are missing, but the
> script should notice this all.
>
> Also I think it'd be good to extract this hack into a little helper
> function, maybe called i830_plane_panning_hack or so. That way it's out of
> the normal code flow and clearly marked as something exceptionel.

Thanks, I'll try that. Still working on it, though.
>
>> As a related question: Is there possibly a command line tool that
>> would allow me to modify the intel chipset registers on the fly
>> without going through a kernel recompile? It would make some
>> experiments just so much simpler.
> intel-gpu-tools has intel_reg_read/write tools. That should help ;-)


These tools are exceptional, and exactly what I was looking for. Thanks 
a lot!

Now, how much is known about the register DSPARB, found at offset 0x70030?

Because, if I just feed this register with "correct" values (for 
whatever "correct" means), I do get a stable image
on pipe A and pipe B. I haven't found out what "correct" is, precisely, 
but there is something I can do here.
By default, on my setup, the value here is 0x17e5f, but depending on the 
scroll position, 0x17e51, 0x17e61, 0x17e6f
or 0x17e63 create stable images. So there is at least some way how to 
get the image right, though I don't quite
understand the mechanics yet.

BTW, now that I looked at intel_regs_dump: When the screen is 
flickering, I do get "pipe A underruns". I tried working
with the watermark registers, but all I can get is a crashed GPU or a 
ripped display, so this is not quite the right approach.
DSPARB seems to be the best trick so far, even though I don't quite get 
how it works.

Any ideas?

Greetings,
     Thomas
Daniel Vetter Nov. 11, 2013, 3:43 p.m. UTC | #2
On Mon, Nov 11, 2013 at 4:33 PM, Thomas Richter <thor@math.tu-berlin.de> wrote:
> Now, how much is known about the register DSPARB, found at offset 0x70030?
>
> Because, if I just feed this register with "correct" values (for whatever
> "correct" means), I do get a stable image
> on pipe A and pipe B. I haven't found out what "correct" is, precisely, but
> there is something I can do here.
> By default, on my setup, the value here is 0x17e5f, but depending on the
> scroll position, 0x17e51, 0x17e61, 0x17e6f
> or 0x17e63 create stable images. So there is at least some way how to get
> the image right, though I don't quite
> understand the mechanics yet.
>
> BTW, now that I looked at intel_regs_dump: When the screen is flickering, I
> do get "pipe A underruns". I tried working
> with the watermark registers, but all I can get is a crashed GPU or a ripped
> display, so this is not quite the right approach.
> DSPARB seems to be the best trick so far, even though I don't quite get how
> it works.

Oh, that's really interesting. gen2 has a unified display fifo on
machines that support 2 outputs. DSPARB tells the hw how to exactly
split this up between the two pipes. There are two bit ranges of
interest here:

bits0-8: "AEND. This field selects one of the splits between two of
the three portions of the display RAM allocated
between display plane A, plane B, and display plane C. This field can
only be programmed when one of
the two pipes is disabled. When it is written. It takes effect on the
next VBLANK for whichever pipe is
currently active. The control granularity is 16-bytes.
The total size of the Almador-M RAM is 288*16 bytes.
The total size of the Montara-GM RAM is 256*16 bytes."

bits 9-18: "BEND. This field selects one of the split between the two
of the three portions of the display RAM
allocated between display planes A, B, and C. This field can only be
programmed when one of the two
display pipes is disabled. It takes effect on the next VBLANK for
whichever pipe is currently active. It
must be programmed to a number larger than the value in AEND.
The control granularity is 16-bytes and the total size of the
Almador-M RAM is 288*16 bytes and the
Montara-GM RAM is 256*16 bytes."

Almador-M is i830M, Montara-GM is i855GM. Other gen2 chips have a
different meaning for this register.

What we'd need to do here is to update this register when switching
the number of active display pipes in the ->modeset_global_resources
hook. We also need to make sure we have updated watermark values set
up already, before rewriting the value of DSPARB (since the watermarks
depend upon the size of the fifo).

For I start I'd go with splitting the fifo according to the display
clock between plane A and B and giving nothing to plane C. We don't
have any code to use plane C so giving everything to just A and B is
better.

I hope this helps you in your endeavor. Otherwise feel free to ping me
on irc or mail.
-Daniel
Thomas Richter Nov. 12, 2013, 4:41 p.m. UTC | #3
Am 11.11.2013 16:43, schrieb Daniel Vetter:
> Oh, that's really interesting. gen2 has a unified display fifo on
> machines that support 2 outputs. DSPARB tells the hw how to exactly
> split this up between the two pipes. There are two bit ranges of
> interest here:

/* snip */

Hmm, why I understand *that* it does make a difference, I do not 
understand the details.
By a unified display fifo do you mean that the display output has an 
internal buffer memory (the fifo) which
basically feeds the the DVOs or the LVDS with memory, which comes via 
DMA into the fifo. Is that right?

By "split", do you mean that a fixed amount of bytes (or rather, lines 
as in multiples of 16 bytes) are allocated for each
participating pipe?

Simply enlarging the fifo does not help (i.e. writing a larger value 
into the register). Just the positions where I get the flicker change, 
but the problem does not go away. So whatever needs to be done is to 
adjust this register according to the alignment of the base address of 
the corresponding DMA engine that feeds the pipe.


> What we'd need to do here is to update this register when switching
> the number of active display pipes in the ->modeset_global_resources
> hook. We also need to make sure we have updated watermark values set
> up already, before rewriting the value of DSPARB (since the watermarks
> depend upon the size of the fifo).
>
> For I start I'd go with splitting the fifo according to the display
> clock between plane A and B and giving nothing to plane C. We don't
> have any code to use plane C so giving everything to just A and B is
> better.

By that you mean "BEND = maximum" and "AEND" in between? That does not 
seem to be sufficient. It needs to be modified according to the buffer 
alignment, and sometimes smaller values work, sometimes larger ones.

Greetings,
     Thomas
Daniel Vetter Nov. 12, 2013, 5:22 p.m. UTC | #4
On Tue, Nov 12, 2013 at 05:41:12PM +0100, Thomas Richter wrote:
> Am 11.11.2013 16:43, schrieb Daniel Vetter:
> >Oh, that's really interesting. gen2 has a unified display fifo on
> >machines that support 2 outputs. DSPARB tells the hw how to exactly
> >split this up between the two pipes. There are two bit ranges of
> >interest here:
> 
> /* snip */
> 
> Hmm, why I understand *that* it does make a difference, I do not
> understand the details.
> By a unified display fifo do you mean that the display output has an
> internal buffer memory (the fifo) which
> basically feeds the the DVOs or the LVDS with memory, which comes
> via DMA into the fifo. Is that right?

Yup. The fifo is there ot paper over memory fetch latencies. The memory
controller sends 64b blocks, but the display output needs to be continous.

> By "split", do you mean that a fixed amount of bytes (or rather,
> lines as in multiples of 16 bytes) are allocated for each
> participating pipe?

Yup.

> Simply enlarging the fifo does not help (i.e. writing a larger value
> into the register). Just the positions where I get the flicker
> change, but the problem does not go away. So whatever needs to be
> done is to adjust this register according to the alignment of the
> base address of the corresponding DMA engine that feeds the pipe.

Hm, that's strange.

> >What we'd need to do here is to update this register when switching
> >the number of active display pipes in the ->modeset_global_resources
> >hook. We also need to make sure we have updated watermark values set
> >up already, before rewriting the value of DSPARB (since the watermarks
> >depend upon the size of the fifo).
> >
> >For I start I'd go with splitting the fifo according to the display
> >clock between plane A and B and giving nothing to plane C. We don't
> >have any code to use plane C so giving everything to just A and B is
> >better.
> 
> By that you mean "BEND = maximum" and "AEND" in between? That does
> not seem to be sufficient. It needs to be modified according to the
> buffer alignment, and sometimes smaller values work, sometimes
> larger ones.

Yeah, I've meant BEND = max and AEND split so that the two resulting sizes
are proportional to the pixel clock (i.e. both pipes should take equal
amount of time roughly to go through the full fifo). Indeed really strang
that this doesn't seem to work.

Looking at docs I don't see any mention of a w/a :(

The only (totally crazy) idea I have is that the fifo falls over if it
fetches accross a tile/page boundary. But no idea how to figure out a
formula that'd work everywhere ...

To simplify things I'd start with just just one pipe display to VGA and
then going through different resolutions and different offset. Maybe
there's a pattern in the display fifo lenghts that work.

Happy hacking!

Cheers, Daniel
Thomas Richter Nov. 13, 2013, 7:50 p.m. UTC | #5
On 12.11.2013 18:22, Daniel Vetter wrote:

Thanks for the explanation how the fifos work, that was helpful.


> Yeah, I've meant BEND = max and AEND split so that the two resulting sizes
> are proportional to the pixel clock (i.e. both pipes should take equal
> amount of time roughly to go through the full fifo). Indeed really strang
> that this doesn't seem to work.
>
> Looking at docs I don't see any mention of a w/a :(

Too bad. I tried to find some systematic in the settings where I do get 
flicker and where not, in specific which settings of AEND create the 
problem. However, it's more complicated than I thought. It not only 
depends on the current scroll position, but also on the history (!) of 
what you did before. It's probably the relative fill position of the 
FIFOs that plays the role here. If the fifo was relatively full before 
starting the scroll, everything remains fine. Otherwise, it runs dry too 
soon. If you keep the history consistent, the outcome is consistent, but 
otherwise results are hard to predict.

Maybe the whole approach of adjusting the start address of the DMA 
engine to realize scrolling is not quite right, and the engine prefers 
to have data aligned to 64 byte boundaries or something like this. Is 
there some other way to delay the output of the FIFO into the pipe? Some 
very antique systems (ehem, like the Atari 800 or Amiga chipsets ;-) had 
"horizontal scroll registers" that realized a pixel delay for an 
otherwise byte-oriented pipeline.

Is there something like this in the i830 such that the actual start 
address of the DMA engine would remain constant, but the pipeline would 
refer or disregard the first n elements? This would avoid the whole problem.

The trouble is not only cosmetical (the flicker): If that happens, any 
application that uses an X video overlay misbehaives (xine crashes the 
system really badly) so the pipe-A underrun should be avoided at all costs.

Concerning the intel gpu tools: Currently, they just brute-force "poke" 
into the display registers. Is there some way to synchronize this 
activity to vblank from user space (I got their sources, so modifying 
them would work, but I would need some kind of mechanism to wait for 
vblank to do that savely).

> The only (totally crazy) idea I have is that the fifo falls over if it
> fetches accross a tile/page boundary. But no idea how to figure out a
> formula that'd work everywhere ...

Exactly. Might be possible, given that I have a relatively easy fix for 
sequential, but not for tiled. If I may add another question: How 
exactly does the tiled access then work? As far as I understand, the 
video RAM is still organized linearly, but the DMA engine fetches data 
differently? Is this right? How exactly are the tiles laid out, and how 
does memory fetches then work?

> To simplify things I'd start with just just one pipe display to VGA and
> then going through different resolutions and different offset. Maybe
> there's a pattern in the display fifo lenghts that work.

See above. Nope. Or rather "to some extend", but it's more complicated 
than that.

I wonder how that works in windows, though. The intel driver does 
support panning to some degree, at least if the resolutions of the 
internal and external display differ.

> Happy hacking!

I'm certainly having fun here. (-:

Greetings,
	Thomas
Daniel Vetter Nov. 13, 2013, 8:20 p.m. UTC | #6
On Wed, Nov 13, 2013 at 08:50:50PM +0100, Thomas Richter wrote:
> On 12.11.2013 18:22, Daniel Vetter wrote:
> 
> Thanks for the explanation how the fifos work, that was helpful.
> 
> 
> >Yeah, I've meant BEND = max and AEND split so that the two resulting sizes
> >are proportional to the pixel clock (i.e. both pipes should take equal
> >amount of time roughly to go through the full fifo). Indeed really strang
> >that this doesn't seem to work.
> >
> >Looking at docs I don't see any mention of a w/a :(
> 
> Too bad. I tried to find some systematic in the settings where I do
> get flicker and where not, in specific which settings of AEND create
> the problem. However, it's more complicated than I thought. It not
> only depends on the current scroll position, but also on the history
> (!) of what you did before. It's probably the relative fill position
> of the FIFOs that plays the role here. If the fifo was relatively
> full before starting the scroll, everything remains fine. Otherwise,
> it runs dry too soon. If you keep the history consistent, the
> outcome is consistent, but otherwise results are hard to predict.

Hm, scrolling should be vsynced, and I'd expect the fifo to properly
refill in the vblank. So this is pretty astonishing. Could it be that
rendering activity affects this, or is this always with an otherwise
completely idle desktop?

> Maybe the whole approach of adjusting the start address of the DMA
> engine to realize scrolling is not quite right, and the engine
> prefers to have data aligned to 64 byte boundaries or something like
> this. Is there some other way to delay the output of the FIFO into
> the pipe? Some very antique systems (ehem, like the Atari 800 or
> Amiga chipsets ;-) had "horizontal scroll registers" that realized a
> pixel delay for an otherwise byte-oriented pipeline.
> 
> Is there something like this in the i830 such that the actual start
> address of the DMA engine would remain constant, but the pipeline
> would refer or disregard the first n elements? This would avoid the
> whole problem.

Nope, we don't have a buffer offset in pixels like on later generations.

> The trouble is not only cosmetical (the flicker): If that happens,
> any application that uses an X video overlay misbehaives (xine
> crashes the system really badly) so the pipe-A underrun should be
> avoided at all costs.

Yeah, the overlay is extremely fickle and if it's dead it'll never work
again until you reboot. And once it crashed all Xv apps won't work any
more :(

> Concerning the intel gpu tools: Currently, they just brute-force
> "poke" into the display registers. Is there some way to synchronize
> this activity to vblank from user space (I got their sources, so
> modifying them would work, but I would need some kind of mechanism
> to wait for vblank to do that savely).

The registers are vblank synced already. So if you poke both the dsparb
and the plane base register from userspace to fake scrolling you should
have as good a vsync as the kernel will get (if you script both register
writes together ofc).
> 
> >The only (totally crazy) idea I have is that the fifo falls over if it
> >fetches accross a tile/page boundary. But no idea how to figure out a
> >formula that'd work everywhere ...
> 
> Exactly. Might be possible, given that I have a relatively easy fix
> for sequential, but not for tiled. If I may add another question:
> How exactly does the tiled access then work? As far as I understand,
> the video RAM is still organized linearly, but the DMA engine
> fetches data differently? Is this right? How exactly are the tiles
> laid out, and how does memory fetches then work?

Tile buffers aren't linear any more in memory. Tiles are 2kb in size and
are laid out in x-major direction. The tile itself is  128 bytes wide and
16 lines high. So presuming you start scanning out out at offset 0 the
dma-engine will read:

0 - 127, 2k - (2k+127), 4k - (4k+127), ... for the first display lane
128 - 255, (2k+128) - (2k+255), (4k+128) - (4k+255), ... for the 2nd display lane
...

> >To simplify things I'd start with just just one pipe display to VGA and
> >then going through different resolutions and different offset. Maybe
> >there's a pattern in the display fifo lenghts that work.
> 
> See above. Nope. Or rather "to some extend", but it's more
> complicated than that.
> 
> I wonder how that works in windows, though. The intel driver does
> support panning to some degree, at least if the resolutions of the
> internal and external display differ.

Could be that windows scrolls in software by copying the buffer around.
Maybe the tearfree option can help here, although I expect it to hurt
badly for gtt constrained i830M platforms. And tbh I don't know whether it
works with panning.

Cheers, Daniel
Thomas Richter Nov. 14, 2013, 7:14 a.m. UTC | #7
On 13.11.2013 21:20, Daniel Vetter wrote:

Thanks Daniel for your explanations. As always, very helpful.

>
> Tile buffers aren't linear any more in memory. Tiles are 2kb in size and
> are laid out in x-major direction. The tile itself is  128 bytes wide and
> 16 lines high. So presuming you start scanning out out at offset 0 the
> dma-engine will read:
>
> 0 - 127, 2k - (2k+127), 4k - (4k+127), ... for the first display lane
> 128 - 255, (2k+128) - (2k+255), (4k+128) - (4k+255), ... for the 2nd display lane
> ...

Hmm, then there is something I don't quite understand. If I check the 
code for the plane buffer start address computation, I do indeed see 
something like tile addressing for GEN4 and up, but for GEN2, I simply 
see a linear address = x + y * stride. This does not look consistent:

/* snip : code from i9xx_update_plane() in i915_display.c */

         linear_offset = y * fb->pitches[0] + x * (fb->bits_per_pixel / 8);

         if (INTEL_INFO(dev)->gen >= 4) {
                 intel_crtc->dspaddr_offset =
                         intel_gen4_compute_page_offset(&x, &y, 
obj->tiling_mode,
 
fb->bits_per_pixel / 8,
                                                        fb->pitches[0]);
                 linear_offset -= intel_crtc->dspaddr_offset;
         } else {
                 intel_crtc->dspaddr_offset = linear_offset;
         }

/* snip */

Are you *really sure* GEN2 does have tiling? Or could it be that this 
bit is used for something else and probably turns on some weird 
powersaving feature that creates some mischief with the FIFO? After all, 
a tile cannot always be 128 pixels long independent of the display 
organization (i.e. "pixel format") *and* have the above formula correct?

There is then at least something I must be missing.

>> I wonder how that works in windows, though. The intel driver does
>> support panning to some degree, at least if the resolutions of the
>> internal and external display differ.
>
> Could be that windows scrolls in software by copying the buffer around.
> Maybe the tearfree option can help here, although I expect it to hurt
> badly for gtt constrained i830M platforms. And tbh I don't know whether it
> works with panning.

Actually, scrolling is quite smooth, I doubt it's a software scrolling. 
But anyhow...

Greetings,
	Thomas
Daniel Vetter Nov. 14, 2013, 8:21 a.m. UTC | #8
On Thu, Nov 14, 2013 at 8:14 AM, Thomas Richter <thor@math.tu-berlin.de> wrote:
> Are you *really sure* GEN2 does have tiling? Or could it be that this bit is
> used for something else and probably turns on some weird powersaving feature
> that creates some mischief with the FIFO? After all, a tile cannot always be
> 128 pixels long independent of the display organization (i.e. "pixel
> format") *and* have the above formula correct?
>
> There is then at least something I must be missing.

On gen2/3 the fence registers make a tile range look linear to both
the gpu and the cpu. On gen4+ the fence registers are only for access
with the cpu, and everything else needs to take tiling into account
explicitly (and there are bits in the registers to tell the gpu that
something is tiled). See the various functions with fence in their
name in i915_gem.c for how this is set up/tracked.
-Daniel
Thomas Richter Nov. 14, 2013, 6:15 p.m. UTC | #9
On 14.11.2013 09:21, Daniel Vetter wrote:

> On gen2/3 the fence registers make a tile range look linear to both
> the gpu and the cpu. On gen4+ the fence registers are only for access
> with the cpu, and everything else needs to take tiling into account
> explicitly (and there are bits in the registers to tell the gpu that
> something is tiled). See the various functions with fence in their
> name in i915_gem.c for how this is set up/tracked.

Hmm. Probably I still don't quite understand. Memory is shared between 
CPU and GPU, so why does a memory write or read by the CPU depend on the 
GPU programming? The GPU is, after all, not a MMU that manipulates the 
address bus of the CPU (or does it?)

If the start address of the display is altered, are the tiles always 
relative to this start, i.e. at the same absolute display position, or 
do they move on screen, i.e. are at the same absolute memory position? 
Would it possibly make sense to check whether modifying the fence 
registers avoids the problem because then the tiles move, and probably 
stay aligned to 16-byte (or 32-byte) boundaries?

Greetings,
	Thomas
Daniel Vetter Nov. 14, 2013, 6:33 p.m. UTC | #10
On Thu, Nov 14, 2013 at 07:15:52PM +0100, Thomas Richter wrote:
> On 14.11.2013 09:21, Daniel Vetter wrote:
> 
> >On gen2/3 the fence registers make a tile range look linear to both
> >the gpu and the cpu. On gen4+ the fence registers are only for access
> >with the cpu, and everything else needs to take tiling into account
> >explicitly (and there are bits in the registers to tell the gpu that
> >something is tiled). See the various functions with fence in their
> >name in i915_gem.c for how this is set up/tracked.
> 
> Hmm. Probably I still don't quite understand. Memory is shared
> between CPU and GPU, so why does a memory write or read by the CPU
> depend on the GPU programming? The GPU is, after all, not a MMU that
> manipulates the address bus of the CPU (or does it?)

The big pci bar in the gpu _is_ an iommu ;-)

> If the start address of the display is altered, are the tiles always
> relative to this start, i.e. at the same absolute display position,
> or do they move on screen, i.e. are at the same absolute memory
> position? Would it possibly make sense to check whether modifying
> the fence registers avoids the problem because then the tiles move,
> and probably stay aligned to 16-byte (or 32-byte) boundaries?

Tiled areas have rather strict alignement constraints. The stride must be
a power of two, the size also and the start needs to be aligned to the
size. So you can't move the tiles. Also moving the tiles would totally
wreak havoc with the screen contents ...
-Daniel
Thomas Richter Nov. 15, 2013, 1:16 p.m. UTC | #11
Hi Daniel, hi others,

did even more experiments. I guess I understand now better. Indeed, the 
trouble seems to be the watermark levels. I played more
with all that, and the real culprit seems to be the FW_BLC register 
controlling the watermarks.

On the i830 with the current settings, it is defined to be 0x1080304 
which sets the watermark a bit too low. If I set it to
0x1080306 instead, I get a stable display in all panning positions 
(hurray!).

I would like to fix this, but I guess I would need to understand the 
logic a little bit better. At the time being, you probably better
put the linear frame buffer workaround on hold, it looks I really got 
something here.

Greetings,
     Thomas
Daniel Vetter Nov. 15, 2013, 3:41 p.m. UTC | #12
On Fri, Nov 15, 2013 at 02:16:11PM +0100, Thomas Richter wrote:
> Hi Daniel, hi others,
> 
> did even more experiments. I guess I understand now better. Indeed,
> the trouble seems to be the watermark levels. I played more
> with all that, and the real culprit seems to be the FW_BLC register
> controlling the watermarks.
> 
> On the i830 with the current settings, it is defined to be 0x1080304
> which sets the watermark a bit too low. If I set it to
> 0x1080306 instead, I get a stable display in all panning positions
> (hurray!).
> 
> I would like to fix this, but I guess I would need to understand the
> logic a little bit better. At the time being, you probably better
> put the linear frame buffer workaround on hold, it looks I really
> got something here.

Gosh, should have read the code more closely. We have a totally botched wm
setup on i830M - the watermark code for the 2nd pipe is just not there!

I'll try to wip up a patch to fix this.
-Daniel
Thomas Richter Nov. 15, 2013, 4:08 p.m. UTC | #13
Am 15.11.2013 16:41, schrieb Daniel Vetter:
>
> Gosh, should have read the code more closely. We have a totally botched wm
> setup on i830M - the watermark code for the 2nd pipe is just not there!

(-: Guess that explains something. Just disregard my earlier patch, 
simply superfluous.
In the meantime, I recompiled the code and *decreased* the latency from 
5000 to 3000, then getting a stable image,
even the boot console is then stable (has never been before). Its still 
off by half a screen, but no longer flickering
left and right.

However, what I do not understand about the watermark computation is 
that a *lower* latency results in a *higher* number
of entries in the FIFO. Shouldn't this quite the reverse?
In specific, I do not understand the subtraction in intel_calculate_wm, 
line 1058.

Anyhow, I let you proceed with the patch and I'm ready and happy to test it.

Greetings,
     Thomas
Thomas Richter Nov. 15, 2013, 5:01 p.m. UTC | #14
Hi Daniel,
> Gosh, should have read the code more closely. We have a totally botched wm
> setup on i830M - the watermark code for the 2nd pipe is just not there!

Well, nice try, but no cigar. (-: That's actually much worse than 
before. The display is now unstable on *both* the internal
and external display, and inspecting the FW_BLC register, it is 
completely off. The current code leaves it at 0x1050101, but
it should be at least 0x1060106, that is, the watermark needs to be 
higher, not lower. I still don't understand why a higher
latency causes a lower watermark, but maybe things work different than 
in my mental model.

Greetings,
     Thomas
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index e5eb11d..e98298f 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -2087,8 +2087,44 @@  static int i9xx_update_plane(struct drm_crtc *crtc, struct drm_framebuffer *fb,
 				     i915_gem_obj_ggtt_offset(obj) + intel_crtc->dspaddr_offset);
 		I915_WRITE(DSPTILEOFF(plane), (y << 16) | x);
 		I915_WRITE(DSPLINOFF(plane), linear_offset);
-	} else
-		I915_WRITE(DSPADDR(plane), i915_gem_obj_ggtt_offset(obj) + linear_offset);
+	} else if (INTEL_INFO(dev)->gen == 2 && IS_I830(dev) && !intel_pipe_has_type(crtc, INTEL_OUTPUT_LVDS)) {
+	  unsigned long planeadr = i915_gem_obj_ggtt_offset(obj) + linear_offset;
+	  DRM_DEBUG_KMS("Plane address is 0x%lx",planeadr);
+	  // I830 panning flicker work around. Only for non-LVDS output, only for i830.
+	  if (obj->tiling_mode != I915_TILING_NONE) {
+	    if ((planeadr & 0x40)) {
+	      DRM_DEBUG_KMS("Detected potential flicker in tiling mode");
+	      DRM_DEBUG_KMS("No workaround available");
+	      DRM_DEBUG_KMS("Use a linear frame buffer");
+	    }
+	  } else {
+	    switch (fb->pixel_format) {
+	    case DRM_FORMAT_XRGB1555:
+	    case DRM_FORMAT_ARGB1555:
+	    case DRM_FORMAT_RGB565:
+	    case DRM_FORMAT_XRGB8888:
+	    case DRM_FORMAT_ARGB8888:
+	    case DRM_FORMAT_XBGR8888:
+	    case DRM_FORMAT_ABGR8888:
+	      {
+		unsigned long int oldadr = I915_READ(DSPADDR(plane));
+		if (((oldadr ^ planeadr) & 0x40) && (planeadr & 0xc0) == 0xc0) {
+		  DRM_DEBUG_KMS("Detected potential flicker in linear mode");
+		  I915_WRITE(DSPADDR(plane), planeadr & (~(0x80)));
+		  POSTING_READ(reg);
+		  intel_wait_for_vblank(dev,intel_crtc->pipe);
+		}
+	      }
+	      break;
+	    default:
+	      DRM_DEBUG_KMS("No flicker workaround available\n");
+	      break;
+	    }
+	  }
+	  I915_WRITE(DSPADDR(plane), planeadr);
+	} else {
+	  I915_WRITE(DSPADDR(plane), i915_gem_obj_ggtt_offset(obj) + linear_offset);
+	}
 	POSTING_READ(reg);
 
 	return 0;