Message ID | 527D024D.8020000@math.tu-berlin.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Am 08.11.2013 17:32, schrieb Daniel Vetter: > Kernel has a tool in scripts/checkpatch.pl which will tell you what's all > off ;-) Also sob line and similar essential things are missing, but the > script should notice this all. > > Also I think it'd be good to extract this hack into a little helper > function, maybe called i830_plane_panning_hack or so. That way it's out of > the normal code flow and clearly marked as something exceptionel. Thanks, I'll try that. Still working on it, though. > >> As a related question: Is there possibly a command line tool that >> would allow me to modify the intel chipset registers on the fly >> without going through a kernel recompile? It would make some >> experiments just so much simpler. > intel-gpu-tools has intel_reg_read/write tools. That should help ;-) These tools are exceptional, and exactly what I was looking for. Thanks a lot! Now, how much is known about the register DSPARB, found at offset 0x70030? Because, if I just feed this register with "correct" values (for whatever "correct" means), I do get a stable image on pipe A and pipe B. I haven't found out what "correct" is, precisely, but there is something I can do here. By default, on my setup, the value here is 0x17e5f, but depending on the scroll position, 0x17e51, 0x17e61, 0x17e6f or 0x17e63 create stable images. So there is at least some way how to get the image right, though I don't quite understand the mechanics yet. BTW, now that I looked at intel_regs_dump: When the screen is flickering, I do get "pipe A underruns". I tried working with the watermark registers, but all I can get is a crashed GPU or a ripped display, so this is not quite the right approach. DSPARB seems to be the best trick so far, even though I don't quite get how it works. Any ideas? Greetings, Thomas
On Mon, Nov 11, 2013 at 4:33 PM, Thomas Richter <thor@math.tu-berlin.de> wrote: > Now, how much is known about the register DSPARB, found at offset 0x70030? > > Because, if I just feed this register with "correct" values (for whatever > "correct" means), I do get a stable image > on pipe A and pipe B. I haven't found out what "correct" is, precisely, but > there is something I can do here. > By default, on my setup, the value here is 0x17e5f, but depending on the > scroll position, 0x17e51, 0x17e61, 0x17e6f > or 0x17e63 create stable images. So there is at least some way how to get > the image right, though I don't quite > understand the mechanics yet. > > BTW, now that I looked at intel_regs_dump: When the screen is flickering, I > do get "pipe A underruns". I tried working > with the watermark registers, but all I can get is a crashed GPU or a ripped > display, so this is not quite the right approach. > DSPARB seems to be the best trick so far, even though I don't quite get how > it works. Oh, that's really interesting. gen2 has a unified display fifo on machines that support 2 outputs. DSPARB tells the hw how to exactly split this up between the two pipes. There are two bit ranges of interest here: bits0-8: "AEND. This field selects one of the splits between two of the three portions of the display RAM allocated between display plane A, plane B, and display plane C. This field can only be programmed when one of the two pipes is disabled. When it is written. It takes effect on the next VBLANK for whichever pipe is currently active. The control granularity is 16-bytes. The total size of the Almador-M RAM is 288*16 bytes. The total size of the Montara-GM RAM is 256*16 bytes." bits 9-18: "BEND. This field selects one of the split between the two of the three portions of the display RAM allocated between display planes A, B, and C. This field can only be programmed when one of the two display pipes is disabled. It takes effect on the next VBLANK for whichever pipe is currently active. It must be programmed to a number larger than the value in AEND. The control granularity is 16-bytes and the total size of the Almador-M RAM is 288*16 bytes and the Montara-GM RAM is 256*16 bytes." Almador-M is i830M, Montara-GM is i855GM. Other gen2 chips have a different meaning for this register. What we'd need to do here is to update this register when switching the number of active display pipes in the ->modeset_global_resources hook. We also need to make sure we have updated watermark values set up already, before rewriting the value of DSPARB (since the watermarks depend upon the size of the fifo). For I start I'd go with splitting the fifo according to the display clock between plane A and B and giving nothing to plane C. We don't have any code to use plane C so giving everything to just A and B is better. I hope this helps you in your endeavor. Otherwise feel free to ping me on irc or mail. -Daniel
Am 11.11.2013 16:43, schrieb Daniel Vetter: > Oh, that's really interesting. gen2 has a unified display fifo on > machines that support 2 outputs. DSPARB tells the hw how to exactly > split this up between the two pipes. There are two bit ranges of > interest here: /* snip */ Hmm, why I understand *that* it does make a difference, I do not understand the details. By a unified display fifo do you mean that the display output has an internal buffer memory (the fifo) which basically feeds the the DVOs or the LVDS with memory, which comes via DMA into the fifo. Is that right? By "split", do you mean that a fixed amount of bytes (or rather, lines as in multiples of 16 bytes) are allocated for each participating pipe? Simply enlarging the fifo does not help (i.e. writing a larger value into the register). Just the positions where I get the flicker change, but the problem does not go away. So whatever needs to be done is to adjust this register according to the alignment of the base address of the corresponding DMA engine that feeds the pipe. > What we'd need to do here is to update this register when switching > the number of active display pipes in the ->modeset_global_resources > hook. We also need to make sure we have updated watermark values set > up already, before rewriting the value of DSPARB (since the watermarks > depend upon the size of the fifo). > > For I start I'd go with splitting the fifo according to the display > clock between plane A and B and giving nothing to plane C. We don't > have any code to use plane C so giving everything to just A and B is > better. By that you mean "BEND = maximum" and "AEND" in between? That does not seem to be sufficient. It needs to be modified according to the buffer alignment, and sometimes smaller values work, sometimes larger ones. Greetings, Thomas
On Tue, Nov 12, 2013 at 05:41:12PM +0100, Thomas Richter wrote: > Am 11.11.2013 16:43, schrieb Daniel Vetter: > >Oh, that's really interesting. gen2 has a unified display fifo on > >machines that support 2 outputs. DSPARB tells the hw how to exactly > >split this up between the two pipes. There are two bit ranges of > >interest here: > > /* snip */ > > Hmm, why I understand *that* it does make a difference, I do not > understand the details. > By a unified display fifo do you mean that the display output has an > internal buffer memory (the fifo) which > basically feeds the the DVOs or the LVDS with memory, which comes > via DMA into the fifo. Is that right? Yup. The fifo is there ot paper over memory fetch latencies. The memory controller sends 64b blocks, but the display output needs to be continous. > By "split", do you mean that a fixed amount of bytes (or rather, > lines as in multiples of 16 bytes) are allocated for each > participating pipe? Yup. > Simply enlarging the fifo does not help (i.e. writing a larger value > into the register). Just the positions where I get the flicker > change, but the problem does not go away. So whatever needs to be > done is to adjust this register according to the alignment of the > base address of the corresponding DMA engine that feeds the pipe. Hm, that's strange. > >What we'd need to do here is to update this register when switching > >the number of active display pipes in the ->modeset_global_resources > >hook. We also need to make sure we have updated watermark values set > >up already, before rewriting the value of DSPARB (since the watermarks > >depend upon the size of the fifo). > > > >For I start I'd go with splitting the fifo according to the display > >clock between plane A and B and giving nothing to plane C. We don't > >have any code to use plane C so giving everything to just A and B is > >better. > > By that you mean "BEND = maximum" and "AEND" in between? That does > not seem to be sufficient. It needs to be modified according to the > buffer alignment, and sometimes smaller values work, sometimes > larger ones. Yeah, I've meant BEND = max and AEND split so that the two resulting sizes are proportional to the pixel clock (i.e. both pipes should take equal amount of time roughly to go through the full fifo). Indeed really strang that this doesn't seem to work. Looking at docs I don't see any mention of a w/a :( The only (totally crazy) idea I have is that the fifo falls over if it fetches accross a tile/page boundary. But no idea how to figure out a formula that'd work everywhere ... To simplify things I'd start with just just one pipe display to VGA and then going through different resolutions and different offset. Maybe there's a pattern in the display fifo lenghts that work. Happy hacking! Cheers, Daniel
On 12.11.2013 18:22, Daniel Vetter wrote: Thanks for the explanation how the fifos work, that was helpful. > Yeah, I've meant BEND = max and AEND split so that the two resulting sizes > are proportional to the pixel clock (i.e. both pipes should take equal > amount of time roughly to go through the full fifo). Indeed really strang > that this doesn't seem to work. > > Looking at docs I don't see any mention of a w/a :( Too bad. I tried to find some systematic in the settings where I do get flicker and where not, in specific which settings of AEND create the problem. However, it's more complicated than I thought. It not only depends on the current scroll position, but also on the history (!) of what you did before. It's probably the relative fill position of the FIFOs that plays the role here. If the fifo was relatively full before starting the scroll, everything remains fine. Otherwise, it runs dry too soon. If you keep the history consistent, the outcome is consistent, but otherwise results are hard to predict. Maybe the whole approach of adjusting the start address of the DMA engine to realize scrolling is not quite right, and the engine prefers to have data aligned to 64 byte boundaries or something like this. Is there some other way to delay the output of the FIFO into the pipe? Some very antique systems (ehem, like the Atari 800 or Amiga chipsets ;-) had "horizontal scroll registers" that realized a pixel delay for an otherwise byte-oriented pipeline. Is there something like this in the i830 such that the actual start address of the DMA engine would remain constant, but the pipeline would refer or disregard the first n elements? This would avoid the whole problem. The trouble is not only cosmetical (the flicker): If that happens, any application that uses an X video overlay misbehaives (xine crashes the system really badly) so the pipe-A underrun should be avoided at all costs. Concerning the intel gpu tools: Currently, they just brute-force "poke" into the display registers. Is there some way to synchronize this activity to vblank from user space (I got their sources, so modifying them would work, but I would need some kind of mechanism to wait for vblank to do that savely). > The only (totally crazy) idea I have is that the fifo falls over if it > fetches accross a tile/page boundary. But no idea how to figure out a > formula that'd work everywhere ... Exactly. Might be possible, given that I have a relatively easy fix for sequential, but not for tiled. If I may add another question: How exactly does the tiled access then work? As far as I understand, the video RAM is still organized linearly, but the DMA engine fetches data differently? Is this right? How exactly are the tiles laid out, and how does memory fetches then work? > To simplify things I'd start with just just one pipe display to VGA and > then going through different resolutions and different offset. Maybe > there's a pattern in the display fifo lenghts that work. See above. Nope. Or rather "to some extend", but it's more complicated than that. I wonder how that works in windows, though. The intel driver does support panning to some degree, at least if the resolutions of the internal and external display differ. > Happy hacking! I'm certainly having fun here. (-: Greetings, Thomas
On Wed, Nov 13, 2013 at 08:50:50PM +0100, Thomas Richter wrote: > On 12.11.2013 18:22, Daniel Vetter wrote: > > Thanks for the explanation how the fifos work, that was helpful. > > > >Yeah, I've meant BEND = max and AEND split so that the two resulting sizes > >are proportional to the pixel clock (i.e. both pipes should take equal > >amount of time roughly to go through the full fifo). Indeed really strang > >that this doesn't seem to work. > > > >Looking at docs I don't see any mention of a w/a :( > > Too bad. I tried to find some systematic in the settings where I do > get flicker and where not, in specific which settings of AEND create > the problem. However, it's more complicated than I thought. It not > only depends on the current scroll position, but also on the history > (!) of what you did before. It's probably the relative fill position > of the FIFOs that plays the role here. If the fifo was relatively > full before starting the scroll, everything remains fine. Otherwise, > it runs dry too soon. If you keep the history consistent, the > outcome is consistent, but otherwise results are hard to predict. Hm, scrolling should be vsynced, and I'd expect the fifo to properly refill in the vblank. So this is pretty astonishing. Could it be that rendering activity affects this, or is this always with an otherwise completely idle desktop? > Maybe the whole approach of adjusting the start address of the DMA > engine to realize scrolling is not quite right, and the engine > prefers to have data aligned to 64 byte boundaries or something like > this. Is there some other way to delay the output of the FIFO into > the pipe? Some very antique systems (ehem, like the Atari 800 or > Amiga chipsets ;-) had "horizontal scroll registers" that realized a > pixel delay for an otherwise byte-oriented pipeline. > > Is there something like this in the i830 such that the actual start > address of the DMA engine would remain constant, but the pipeline > would refer or disregard the first n elements? This would avoid the > whole problem. Nope, we don't have a buffer offset in pixels like on later generations. > The trouble is not only cosmetical (the flicker): If that happens, > any application that uses an X video overlay misbehaives (xine > crashes the system really badly) so the pipe-A underrun should be > avoided at all costs. Yeah, the overlay is extremely fickle and if it's dead it'll never work again until you reboot. And once it crashed all Xv apps won't work any more :( > Concerning the intel gpu tools: Currently, they just brute-force > "poke" into the display registers. Is there some way to synchronize > this activity to vblank from user space (I got their sources, so > modifying them would work, but I would need some kind of mechanism > to wait for vblank to do that savely). The registers are vblank synced already. So if you poke both the dsparb and the plane base register from userspace to fake scrolling you should have as good a vsync as the kernel will get (if you script both register writes together ofc). > > >The only (totally crazy) idea I have is that the fifo falls over if it > >fetches accross a tile/page boundary. But no idea how to figure out a > >formula that'd work everywhere ... > > Exactly. Might be possible, given that I have a relatively easy fix > for sequential, but not for tiled. If I may add another question: > How exactly does the tiled access then work? As far as I understand, > the video RAM is still organized linearly, but the DMA engine > fetches data differently? Is this right? How exactly are the tiles > laid out, and how does memory fetches then work? Tile buffers aren't linear any more in memory. Tiles are 2kb in size and are laid out in x-major direction. The tile itself is 128 bytes wide and 16 lines high. So presuming you start scanning out out at offset 0 the dma-engine will read: 0 - 127, 2k - (2k+127), 4k - (4k+127), ... for the first display lane 128 - 255, (2k+128) - (2k+255), (4k+128) - (4k+255), ... for the 2nd display lane ... > >To simplify things I'd start with just just one pipe display to VGA and > >then going through different resolutions and different offset. Maybe > >there's a pattern in the display fifo lenghts that work. > > See above. Nope. Or rather "to some extend", but it's more > complicated than that. > > I wonder how that works in windows, though. The intel driver does > support panning to some degree, at least if the resolutions of the > internal and external display differ. Could be that windows scrolls in software by copying the buffer around. Maybe the tearfree option can help here, although I expect it to hurt badly for gtt constrained i830M platforms. And tbh I don't know whether it works with panning. Cheers, Daniel
On 13.11.2013 21:20, Daniel Vetter wrote: Thanks Daniel for your explanations. As always, very helpful. > > Tile buffers aren't linear any more in memory. Tiles are 2kb in size and > are laid out in x-major direction. The tile itself is 128 bytes wide and > 16 lines high. So presuming you start scanning out out at offset 0 the > dma-engine will read: > > 0 - 127, 2k - (2k+127), 4k - (4k+127), ... for the first display lane > 128 - 255, (2k+128) - (2k+255), (4k+128) - (4k+255), ... for the 2nd display lane > ... Hmm, then there is something I don't quite understand. If I check the code for the plane buffer start address computation, I do indeed see something like tile addressing for GEN4 and up, but for GEN2, I simply see a linear address = x + y * stride. This does not look consistent: /* snip : code from i9xx_update_plane() in i915_display.c */ linear_offset = y * fb->pitches[0] + x * (fb->bits_per_pixel / 8); if (INTEL_INFO(dev)->gen >= 4) { intel_crtc->dspaddr_offset = intel_gen4_compute_page_offset(&x, &y, obj->tiling_mode, fb->bits_per_pixel / 8, fb->pitches[0]); linear_offset -= intel_crtc->dspaddr_offset; } else { intel_crtc->dspaddr_offset = linear_offset; } /* snip */ Are you *really sure* GEN2 does have tiling? Or could it be that this bit is used for something else and probably turns on some weird powersaving feature that creates some mischief with the FIFO? After all, a tile cannot always be 128 pixels long independent of the display organization (i.e. "pixel format") *and* have the above formula correct? There is then at least something I must be missing. >> I wonder how that works in windows, though. The intel driver does >> support panning to some degree, at least if the resolutions of the >> internal and external display differ. > > Could be that windows scrolls in software by copying the buffer around. > Maybe the tearfree option can help here, although I expect it to hurt > badly for gtt constrained i830M platforms. And tbh I don't know whether it > works with panning. Actually, scrolling is quite smooth, I doubt it's a software scrolling. But anyhow... Greetings, Thomas
On Thu, Nov 14, 2013 at 8:14 AM, Thomas Richter <thor@math.tu-berlin.de> wrote: > Are you *really sure* GEN2 does have tiling? Or could it be that this bit is > used for something else and probably turns on some weird powersaving feature > that creates some mischief with the FIFO? After all, a tile cannot always be > 128 pixels long independent of the display organization (i.e. "pixel > format") *and* have the above formula correct? > > There is then at least something I must be missing. On gen2/3 the fence registers make a tile range look linear to both the gpu and the cpu. On gen4+ the fence registers are only for access with the cpu, and everything else needs to take tiling into account explicitly (and there are bits in the registers to tell the gpu that something is tiled). See the various functions with fence in their name in i915_gem.c for how this is set up/tracked. -Daniel
On 14.11.2013 09:21, Daniel Vetter wrote: > On gen2/3 the fence registers make a tile range look linear to both > the gpu and the cpu. On gen4+ the fence registers are only for access > with the cpu, and everything else needs to take tiling into account > explicitly (and there are bits in the registers to tell the gpu that > something is tiled). See the various functions with fence in their > name in i915_gem.c for how this is set up/tracked. Hmm. Probably I still don't quite understand. Memory is shared between CPU and GPU, so why does a memory write or read by the CPU depend on the GPU programming? The GPU is, after all, not a MMU that manipulates the address bus of the CPU (or does it?) If the start address of the display is altered, are the tiles always relative to this start, i.e. at the same absolute display position, or do they move on screen, i.e. are at the same absolute memory position? Would it possibly make sense to check whether modifying the fence registers avoids the problem because then the tiles move, and probably stay aligned to 16-byte (or 32-byte) boundaries? Greetings, Thomas
On Thu, Nov 14, 2013 at 07:15:52PM +0100, Thomas Richter wrote: > On 14.11.2013 09:21, Daniel Vetter wrote: > > >On gen2/3 the fence registers make a tile range look linear to both > >the gpu and the cpu. On gen4+ the fence registers are only for access > >with the cpu, and everything else needs to take tiling into account > >explicitly (and there are bits in the registers to tell the gpu that > >something is tiled). See the various functions with fence in their > >name in i915_gem.c for how this is set up/tracked. > > Hmm. Probably I still don't quite understand. Memory is shared > between CPU and GPU, so why does a memory write or read by the CPU > depend on the GPU programming? The GPU is, after all, not a MMU that > manipulates the address bus of the CPU (or does it?) The big pci bar in the gpu _is_ an iommu ;-) > If the start address of the display is altered, are the tiles always > relative to this start, i.e. at the same absolute display position, > or do they move on screen, i.e. are at the same absolute memory > position? Would it possibly make sense to check whether modifying > the fence registers avoids the problem because then the tiles move, > and probably stay aligned to 16-byte (or 32-byte) boundaries? Tiled areas have rather strict alignement constraints. The stride must be a power of two, the size also and the start needs to be aligned to the size. So you can't move the tiles. Also moving the tiles would totally wreak havoc with the screen contents ... -Daniel
Hi Daniel, hi others, did even more experiments. I guess I understand now better. Indeed, the trouble seems to be the watermark levels. I played more with all that, and the real culprit seems to be the FW_BLC register controlling the watermarks. On the i830 with the current settings, it is defined to be 0x1080304 which sets the watermark a bit too low. If I set it to 0x1080306 instead, I get a stable display in all panning positions (hurray!). I would like to fix this, but I guess I would need to understand the logic a little bit better. At the time being, you probably better put the linear frame buffer workaround on hold, it looks I really got something here. Greetings, Thomas
On Fri, Nov 15, 2013 at 02:16:11PM +0100, Thomas Richter wrote: > Hi Daniel, hi others, > > did even more experiments. I guess I understand now better. Indeed, > the trouble seems to be the watermark levels. I played more > with all that, and the real culprit seems to be the FW_BLC register > controlling the watermarks. > > On the i830 with the current settings, it is defined to be 0x1080304 > which sets the watermark a bit too low. If I set it to > 0x1080306 instead, I get a stable display in all panning positions > (hurray!). > > I would like to fix this, but I guess I would need to understand the > logic a little bit better. At the time being, you probably better > put the linear frame buffer workaround on hold, it looks I really > got something here. Gosh, should have read the code more closely. We have a totally botched wm setup on i830M - the watermark code for the 2nd pipe is just not there! I'll try to wip up a patch to fix this. -Daniel
Am 15.11.2013 16:41, schrieb Daniel Vetter: > > Gosh, should have read the code more closely. We have a totally botched wm > setup on i830M - the watermark code for the 2nd pipe is just not there! (-: Guess that explains something. Just disregard my earlier patch, simply superfluous. In the meantime, I recompiled the code and *decreased* the latency from 5000 to 3000, then getting a stable image, even the boot console is then stable (has never been before). Its still off by half a screen, but no longer flickering left and right. However, what I do not understand about the watermark computation is that a *lower* latency results in a *higher* number of entries in the FIFO. Shouldn't this quite the reverse? In specific, I do not understand the subtraction in intel_calculate_wm, line 1058. Anyhow, I let you proceed with the patch and I'm ready and happy to test it. Greetings, Thomas
Hi Daniel, > Gosh, should have read the code more closely. We have a totally botched wm > setup on i830M - the watermark code for the 2nd pipe is just not there! Well, nice try, but no cigar. (-: That's actually much worse than before. The display is now unstable on *both* the internal and external display, and inspecting the FW_BLC register, it is completely off. The current code leaves it at 0x1050101, but it should be at least 0x1060106, that is, the watermark needs to be higher, not lower. I still don't understand why a higher latency causes a lower watermark, but maybe things work different than in my mental model. Greetings, Thomas
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index e5eb11d..e98298f 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -2087,8 +2087,44 @@ static int i9xx_update_plane(struct drm_crtc *crtc, struct drm_framebuffer *fb, i915_gem_obj_ggtt_offset(obj) + intel_crtc->dspaddr_offset); I915_WRITE(DSPTILEOFF(plane), (y << 16) | x); I915_WRITE(DSPLINOFF(plane), linear_offset); - } else - I915_WRITE(DSPADDR(plane), i915_gem_obj_ggtt_offset(obj) + linear_offset); + } else if (INTEL_INFO(dev)->gen == 2 && IS_I830(dev) && !intel_pipe_has_type(crtc, INTEL_OUTPUT_LVDS)) { + unsigned long planeadr = i915_gem_obj_ggtt_offset(obj) + linear_offset; + DRM_DEBUG_KMS("Plane address is 0x%lx",planeadr); + // I830 panning flicker work around. Only for non-LVDS output, only for i830. + if (obj->tiling_mode != I915_TILING_NONE) { + if ((planeadr & 0x40)) { + DRM_DEBUG_KMS("Detected potential flicker in tiling mode"); + DRM_DEBUG_KMS("No workaround available"); + DRM_DEBUG_KMS("Use a linear frame buffer"); + } + } else { + switch (fb->pixel_format) { + case DRM_FORMAT_XRGB1555: + case DRM_FORMAT_ARGB1555: + case DRM_FORMAT_RGB565: + case DRM_FORMAT_XRGB8888: + case DRM_FORMAT_ARGB8888: + case DRM_FORMAT_XBGR8888: + case DRM_FORMAT_ABGR8888: + { + unsigned long int oldadr = I915_READ(DSPADDR(plane)); + if (((oldadr ^ planeadr) & 0x40) && (planeadr & 0xc0) == 0xc0) { + DRM_DEBUG_KMS("Detected potential flicker in linear mode"); + I915_WRITE(DSPADDR(plane), planeadr & (~(0x80))); + POSTING_READ(reg); + intel_wait_for_vblank(dev,intel_crtc->pipe); + } + } + break; + default: + DRM_DEBUG_KMS("No flicker workaround available\n"); + break; + } + } + I915_WRITE(DSPADDR(plane), planeadr); + } else { + I915_WRITE(DSPADDR(plane), i915_gem_obj_ggtt_offset(obj) + linear_offset); + } POSTING_READ(reg); return 0;