Small vc4 kms fixes and some questions.

On 05/13/2016 08:51 PM, Eric Anholt wrote:
> Mario Kleiner <mario.kleiner.de@gmail.com> writes:
>
>> On 05/09/2016 09:38 PM, Eric Anholt wrote:
>>> Mario Kleiner <mario.kleiner.de@gmail.com> writes:
>>>
>>>> Hi Eric and all,
>>>>
>>>> two small fixes against vc4 kms, built and tested agains the
>>>> Raspberry Pi foundations 4.4.8 kernel tree on RPi2B.
>>>>
>>>> I'm tinkering with a Rpi 2B a bit to see if your vc4 work can
>>>> already make the Pi useful as a device for some serious but low
>>>> cost neuro-science applications.
>>>>
>>>> Eric:
>>>>
>>>> Is there any public documentation about the HVS hardware video scaler
>>>> or the pixel valves? I could find other docs about Videocore's 3d
>>>> part, but nothing about hvs or the pixel valves? Or are the register
>>>> definitions inside the vc4 already all that exists in the hw?
>>>
>>> Nope, docs for display never got released.  I've got docs internally,
>>> and I'm happy to try to answer questions.
>>>
>>
>> Ah good. My questions are around making the pageflip completion events
>> and vblank timestamps as precise and reliable as possible.
>>
>> Atm. i'm working on a patch to the flip completion handling to make it
>> more robust. The current code in my stress tests with 10000 flips sends
>> out flip completions too early in about 1-2% of the trials.
>>
>> My current patch reduces these to 0 failures in my test. I'll send the
>> patch out later after some more tweaking and cleanup.
>>
>> E.g., for crtc/pixelvalve 1 the patch checks if the SCALER_DISPLIST1 and
>> SCALER_DISPLACT1 registers in the HVS match, and only then sends out the
>> pageflip completion event, otherwise waits for the next vblank. I assume
>> SCALER_DISPLACT1 is the true current value for start of active display
>> list whereas SCALER_DISPLIST1 is the value that got latched and then
>> gets committed at the next vblank after writing to the reg. This seems
>> to work well according to my special measurement equipment which can
>> timestamp the true start of display of a new framebuffer.
>
> Oh, interesting.  The docs had said that the display lists were
> re-parsed on every line, so I assumed that also meant that the reparsing
> started from the head pointer every line.  I've confirmed in the
> hardware what you've found: the head pointer is only latched at VSTART
> or INIT signal to the display fifo (that start signal comes from the
> pixel valve).
>

Yes. You can observe it also as a symptom with async pageflips, where 
the return from flip is async, but the display still doesn't tear.

>> However i don't know if this already perfect, or just strongly reduced
>> error rate, so i need to know when the value gets committed (start of
>> vblank, end of vblank, vsync)? And when does the vblank irq fire for the
>> different possible settings of:
>>
>> # define PV_INT_VID_IDLE                        BIT(9)
>> # define PV_INT_VFP_END                         BIT(8)
>> # define PV_INT_VFP_START                       BIT(7)
>> # define PV_INT_VACT_START                      BIT(6)
>> # define PV_INT_VBP_START                       BIT(5)
>> # define PV_INT_VSYNC_START                     BIT(4)
>>
>> Currently the driver uses PV_INT_VFP_START.
>
> OK, looks like the PV timing state is:
>
> IDLE
> START
> (vfp lines)
> VSYNC
> (vsw lines)
> VBP
> (vbp lines)
> VACT
> (vact lines)
> VFP
> (jump to waiting for vfp lines)
>
> The normal timing loop after START does its transitions at HFP, and
> those transitions are when you get the VFP/VBP/VSYNC START/END
> interrupts.
>
> You'll jump back to IDLE almost right away if VID_EN is dropped.
>
> The VSTART signal to the HVS is when PV does IDLE -> START or on the
> last pixel of the active scanout.  Note, this is *not* the PV's START
> signal, which looks like it's basically unused.  Also, I think it's an
> interesting note that we don't have VFP_START on our first frame, as far
> as I've found.
>
> The PV requests that the HVS generate the next line at the last pixel of
> each HACTIVE when we're either at end of VBP or VACTIVE-but-not-its-end.
>

Thanks for the detailed explanation Eric. The way this is designed is 
almost as if the Broadcom hw engineers had read our DRM vblank handling 
and DRI swap scheduling code and decided to build the hw to exactly 
match the expectations of our code :)

This means that my patch for pageflip completion robustness should be 
perfectly robust and not subject to races with the hardware :) - My 
testing over many runs of 10000's of flips confirms that. I've sent the 
patch out for review, cc'ed to you, and it should be fine as is for 
inclusion if you are happy with it.

>> The second question is if the HVS or pixelvalves have some kind of
>> scanline register that reports the currently scanned out scanline? I'd
>> like to implement scanout position queries, so we can get instant high
>> precision vblank timestamps if possible like we have for intel, amd and
>> nouveau, so we'd have precise timestamps, a vblank counter and also
>> additional power savings. Or lacking that are there other regs that
>> could be used to timestamp vblanks or updates of display lists in the
>> hardware?
>
> HVS has bits 0:11 of DISPSTATx for the Y line being generated.  That
> will be in a different clock domain from the PV, but it's probably good
> enough, right?
>

Mostly. It's not quite as good as having true scanout position from the 
PV's. Attached is my current w.i.p patch for scanoutpos based 
timestamping. It already works quite well with my timing tests, but as 
you can see from the code and the longish explanation, we can't get 
close to perfect accuracy if we query the timestamp while the PV is in 
vblank, and need some trickery to get ok'ish results there.

The problem, as far as my understanding of the hw from the results goes, 
is that the HVS has some linebuffer fifo which can hold a couple of 
composited scanlines, e.g., 13 - 24 for typical video resolutions, for 
later consumption by the PV. The HVS refills much faster than the PV 
consumes. During active scanout that means the PV and HVS work in 
lockstep, the HVS fifo is almost always completely full and the HVS is 
throttled to the rate with which the PV consumes. This is good, because 
the scanlinepos of the PV is the compositing pos of the HVS from 
DISPSTATx minus the capacity of the fifo.

At the last few lines of active scanout the HVS stops compositing while
the PV drains the fifo, so our position estimation gets inaccurate - not 
a big deal in practice. In VBLANK however, we don't get any meaningful 
reading because the HVS apparently quickly refills the fifo to full 
capacity after the VSTART signal from the PV and then it is idle until 
start of active scanout when the PV starts to consume from the fifo again.

You can see multiple special cases for "in vblank" to deal with this ok'ish.

Anyway, this still gives us almost all advantages of scanoutpos based 
timestamping, except for the blind spot in vblank when we can't use HVS 
readings. Even so the timestamps will always be accurate to an error of 
less than 1 vblank duration or ~1 msec, and typically accurate to about 
0.1 msecs, according to my measurements.

Two questions:

1. Can you tell me something about the size of that fifo - capacity in 
lines, depending on horizontal resolution? My heuristic formula in the 
patch...

fifo_lines = (2048 * 7 / mode->crtc_hdisplay) - 1;

... seems to work well for the three or four video modes i could 
actually test. But having the real numbers would be better.

2. In the special case hack for vpos < fifo_lines, i try to estimate the 
refill speed of the HVS as ratio between HVS clock and mode clock...

*vpos = -vblank_lines + (*vpos * mode->crtc_clock / 250000);

... under the assumption that the HVS clock is == system clock and that 
clock is a constant 250 Mhz, based on some numbers from some of the 
public docs. However, i'm not sure if the 250 Mhz is right, or if this 
is even constant across Soc's or wrt. power management. That specific 
code path so far doesn't really improve precision. I'm not sure if i 
should drop it, or refine it, or how. But maybe my assumptions about HVS 
composition rate vs. PV scanout rate are wrong there, or the clock value 
is wrong?

thanks,
-mario

Small vc4 kms fixes and some questions.

Commit Message

Patch