From patchwork Mon Nov 23 16:02:11 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Mario Kleiner X-Patchwork-Id: 7729361 Return-Path: X-Original-To: patchwork-dri-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id BC1499F39B for ; Mon, 30 Nov 2015 20:13:27 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 36EF32050E for ; Mon, 30 Nov 2015 20:13:26 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by mail.kernel.org (Postfix) with ESMTP id CD5912049D for ; Mon, 30 Nov 2015 20:13:23 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id DF5E86E65F; Mon, 30 Nov 2015 12:13:22 -0800 (PST) X-Original-To: dri-devel@lists.freedesktop.org Delivered-To: dri-devel@lists.freedesktop.org Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTP id A769C6E65F for ; Mon, 30 Nov 2015 12:13:21 -0800 (PST) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga101.fm.intel.com with ESMTP; 30 Nov 2015 12:13:02 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.20,365,1444719600"; d="scan'208";a="861530797" Received: from stinkbox.fi.intel.com (HELO stinkbox) ([10.237.72.174]) by orsmga002.jf.intel.com with SMTP; 30 Nov 2015 12:13:00 -0800 Received: by stinkbox (sSMTP sendmail emulation); Mon, 30 Nov 2015 22:12:59 +0200 Resent-From: Ville =?iso-8859-1?Q?Syrj=E4l=E4?= Resent-Date: Mon, 30 Nov 2015 22:12:59 +0200 Resent-Message-ID: <20151130201259.GX4437@intel.com> Resent-To: dri-devel@lists.freedesktop.org X-Original-To: ville.syrjala@linux.intel.com Delivered-To: ville.syrjala@linux.intel.com Received: from linux.intel.com [10.23.219.25] by localhost with IMAP (fetchmail-6.3.26) for (single-drop); Mon, 23 Nov 2015 18:09:16 +0200 (EET) Received: from fmsmga002.fm.intel.com (fmsmga002.fm.intel.com [10.253.24.26]) by linux.intel.com (Postfix) with ESMTP id B74AA6A4083 for ; Mon, 23 Nov 2015 08:01:20 -0800 (PST) Received: from orsmga102-1.jf.intel.com (HELO mga09.intel.com) ([10.7.208.27]) by fmsmga002-1.fm.intel.com with ESMTP; 23 Nov 2015 08:02:27 -0800 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A0BGAABfN1NWlTRSfUpeFgMBAQEBDwEBAQGDXm+/Hg6BZSWFagKBRDgUAQEBAQEBAREBAQEBBwsLCR8whDQBAQEBAgESEQQZARsSCgEBAwELBgULDQkWCwICCQMCAQIBDwIRAQUBHAYBDAYCAQEeh3YBAwoIBQigcYExPjGLSIFqgnmFXgoZJw1WhBcBAQEBAQEBAQEBAQEBAQEBAQEBEQMBBQ6LRIFAAYESHoFNgzeBRAWHRIVagTOHf4JZgWFqhheBdoIkhmkMBItbhgY2gRcfAYJHDQ0JB4FXcQGFKgEBAQ X-IPAS-Result: A0BGAABfN1NWlTRSfUpeFgMBAQEBDwEBAQGDXm+/Hg6BZSWFagKBRDgUAQEBAQEBAREBAQEBBwsLCR8whDQBAQEBAgESEQQZARsSCgEBAwELBgULDQkWCwICCQMCAQIBDwIRAQUBHAYBDAYCAQEeh3YBAwoIBQigcYExPjGLSIFqgnmFXgoZJw1WhBcBAQEBAQEBAQEBAQEBAQEBAQEBEQMBBQ6LRIFAAYESHoFNgzeBRAWHRIVagTOHf4JZgWFqhheBdoIkhmkMBItbhgY2gRcfAYJHDQ0JB4FXcQGFKgEBAQ X-IronPort-AV: E=Sophos;i="5.20,337,1444719600"; d="scan'208";a="334903181" Received: from mail-wm0-f52.google.com ([74.125.82.52]) by mtab.intel.com with ESMTP; 23 Nov 2015 08:02:14 -0800 Received: by wmww144 with SMTP id w144so102591623wmw.1 for ; Mon, 23 Nov 2015 08:02:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-type; bh=jWIqMMthLd8myikB9SPLxTh2dgJ7agn9gNKZQde5z3M=; b=UC4HTspIdBD67oq9rwfj9maK6x0rk9Bw+MfiTEFc3ueriJTF3CXqmNBRz1RkM9lttd Y8je72z3dPn8+QhxY9IRr17IAdkY9E95ZQz8itY0NZVbFgziGqcO+oZ/iV57BqHlfGft D4ZrcWwda796OjqlVtY2qOI0hsFQsvqL1Ad1gVjR7Q/cWQTJTF4uotY/dQaPfsHFjJG6 5EtL6UQ9q5rBztjJffgt/sgbdbfK0sc9d46e8fwUwkav7oJvEJU9PiM3QIdGTnxA+aps +iKVrabc1d1EZHqRe5qIgm2/ohSiv+3Rlvbj8KnC3LO1UA+gygdsq29LOmHvrdZiiPUH Kedg== X-Received: by 10.28.59.131 with SMTP id i125mr14897381wma.75.1448294533518; Mon, 23 Nov 2015 08:02:13 -0800 (PST) Received: from [172.25.194.222] (cin-11.medizin.uni-tuebingen.de. [134.2.118.242]) by smtp.gmail.com with ESMTPSA id q77sm13834137wmd.22.2015.11.23.08.02.12 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 23 Nov 2015 08:02:12 -0800 (PST) Subject: Re: Funky new vblank counter regressions in Linux 4.4-rc1 To: Alex Deucher , Harry Wentland References: <564E0AF4.3050406@gmail.com> From: Mario Kleiner Message-ID: <56533883.8020706@gmail.com> Date: Mon, 23 Nov 2015 17:02:11 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: Cc: =?UTF-8?Q?Michel_D=c3=a4nzer?= X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED,RCVD_IN_DNSWL_MED,T_DKIM_INVALID,T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 11/20/2015 04:42 AM, Alex Deucher wrote: > On Thu, Nov 19, 2015 at 12:46 PM, Mario Kleiner > wrote: >> Hi Alex and Michel and Ville, >> >> it's "fix vblank stuff" time again ;-) > > Adding Harry from our display team. He might be able to fill in the > blanks of on some of this better than I can. It might also be worth > checking to see how our DAL (our new display code which is being > developed directly by our display team) code handles this. It could > be that we are just missing register settings: Thanks Alex! And hello Harry :) > http://cgit.freedesktop.org/~agd5f/linux/log/?h=DAL-wip I'll have a look at this. > Additionally we've published full registers headers for the display block: > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/amd/include/asic_reg/dce > The DCE8 stuff should generally apply back to DCE4. If you have > questions about registers older asics not covered in the hw docs, let > me know. Note the new headers are dword aligned rather than byte > aligned. > I've tested now with two different progressive modes on DCE3 and one progressive mode on DCE4, the only cards i have atm. So far it seems that the framecounter indeed increments when the vpos from the scanout position query jumps back to zero. Attached for reference is my current patch to radeon-kms. This one seems to work reliably so far, also if i enable the immediate vblank irq disable which we so far only used on intel-kms. But according to this patch the framecounter increment happens somewhere in the middle of vblank. Essentially from the vpos extracted from EVERGREEN_CRTC_STATUS_POSITION which defines start of vblank ("Start" part of EVERGREEN_CRTC_V_BLANK_START_END) until maximum ie. VTOTAL-1 the framecounter stays on the count of the old previous scanout cycle. Then when vpos wraps to zero the framecounter increments by 1. And then we have another couple of dozen lines inside vblank until reaching the "End" part of EVERGREEN_CRTC_V_BLANK_START_END and entering active scanout for the new frame. So the position of observed framecounter increment seems to be not close to the end of vblank ("start of frame" indicator?), but a couple of scanlines after start of vblank. E.g., for a 2560x1440 video mode at 60 Hz, start of vblank is 1478, vtotal is 1481, end of vblank is 38. So i enter the vblank and see the old framecounter for vpos = 1478, 1479, 1480, then it wraps to 0 and the framecounter increments by 1, then 38 scanlines later the vblank ends. So i seem to have something that seems to work in practice and this "increment framecounter if vpos wraps back to zero" behavior makes some sense. It just doesn't conform to what those descriptions for start_line and "start of frame" indicator describe? I'll test with a few more video modes. thanks, -mario >> >> Ville's changes to the DRM's drm_handle_vblank() / drm_update_vblank_count() >> code in Linux 4.4 not only made that code more elegant, but also removed the >> robustness against the vblank irq quirks in AMD hw and similar hardware. So >> now i get tons of off-by-one errors and >> >> "[ 432.345] (WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip >> completion event has impossible msc 24803 < target_msc 24804" XOrg messages >> from that kernel. >> >> One of the reasons for trouble is that AMD hw quirk where the hw fires an >> extra vblank irq shortly after vblank irq's get enabled, not synchronized to >> vblank, but typically in the middle of active scanout, so we get a redundant >> call to drm_handle_vblank in the middle of scanout. >> >> To fix that i have a minor patch to make drm_update_vblank_count() again >> robust against such redundant calls, which i will send out later to the >> mailing list. Diff attached for reference. >> >> The second quirk of AMD hw is that the vblank interrupt fires a few >> scanlines before start of vblank, so drm_handle_vblank -> >> drm_update_vblank_count() -> dev->driver->get_vblank_counter() gets called >> before the start of the vblank for which the new vblank count should be >> queried. >> >> The third problem is that the DRM vblank handling always had the assumption >> that hardware vblank counters would essentially increment at leading edge of >> vblank - basically in sync with the firing of the vblank irq, so that a hw >> counter readout from within the vblank irq handler would always deliver the >> new incremented value. If this assumption is violated then the counting by >> use of the hw counter gets unreliable, because depending on random small >> delays in irq handling the code may end up sampling the hw counter pre- or >> post-increment, leading to inconsistent updating and funky bugs. It just so >> happens that AMD hardware doesn't increment the hw counter at leading edge >> of vblank, so stuff falls apart. >> >> So to fix those two problems i'm tinkering with cooking the hw vblank >> counter value returned by radeon_get_vblank_counter_kms() to make it appear >> as if the counter incremented at leading edge of vblank in sync with vblank >> irq. >> >> It almost sort of works on the rs600 code path, but i need a bit of info >> from you: >> >> 1. There's this register from the old specs for m76.pdf, which is not part >> of the current register defines for radeon-kms: >> >> "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]" >> >> It contains the lower 16 bits of framecounter and the 13 bits of vertical >> scanout position. It seems to give the same readings as the 24 bit >> R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This would >> come handy. >> >> Does Evergreen and later have a same/similar register and where is it? > > Yes. CRTC_STATUS_VF_COUNT > > CRTC:CRTC_STATUS_VF_COUNT · [R/W] · 32 bits · Access: 8/16/32 · > [INST0] GpuF0MMReg:0x6e9c, [INST1] GpuF0MMReg:0x7a9c, [INST2] > GpuF0MMReg:0x1069c, [INST3] GpuF0MMReg:0x1129c, [INST4] > GpuF0MMReg:0x11e9c, [INST5] GpuF0MMReg:0x12a9c > DESCRIPTION: Current composite vertical and frame count for CRTC > Field Name Bits Default Description > CRTC_VF_COUNT > (Access: R) 28:0 0x0 Reports current vertical and frame count > >> >> 2. The hw framecounter seems to increment when the vertical scanout position >> wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3 gpu i tested so >> far. Is this so on all asics? And is the hw counter increment happening >> exactly at the moment that vertical scanout position jumps back to zero, ie. >> both events are driven by the same signal? Or is the framecounter increment >> just happening somewhere inside either scanline VTOTAL-1 or scanline 0? > > Unless Harry knows, we can ask the hw team, but I doubt they would > have changed it. I think it's tied to the start line which is > controlled by this register: > CRTC:CRTC_START_LINE_CONTROL · [R/W] · 32 bits · Access: 8/16/32 > · [INST0] GpuF0MMReg:0x6ecc, [INST1] GpuF0MMReg:0x7acc, [INST2] > GpuF0MMReg:0x106cc, [INST3] GpuF0MMReg:0x112cc, [INST4] > GpuF0MMReg:0x11ecc, [INST5] GpuF0MMReg:0x12acc > DESCRIPTION: move start_line signal earlier by 1 line in CRTC > Field Name Bits Default Description > CRTC_PROGRESSIVE_START_LINE_EARLY 0 0x0 move start_line > signal by 1 line eariler in progressive mode > > CRTC_INTERLACE_START_LINE_EARLY 8 0x1 move start_line > signal by 1 line earlier in interlaced timing mode > > CRTC_ADVANCED_START_LINE_POSITION 19:16 0x4 Advanced > Start Line position respect to not VBLANK. Default of 4 means the > Advanced Start Line is 4 lines before the first non VBLANK line > > The following info I dug up internally may be useful: > > The position of the CRTC output timing generator when the “start of > frame” indicator occurs depends on several conditions. These > conditions are whether the timing generator is outputting timing for a > progressive or interlaced display mode, whether the scaler is enabled, > and if so, whether the register to select an earlier than normal > “start of frame” indicator is set. The “start of frame” indicator > typically occurs 2 lines before the vertical blank end position > (DxCRTC_V_BLANK_END) is reached > > When interlaced output display modes are enabled > (DxCRTC_INTERLACE_ENABLE = 1) and the CRTC timing generator is enabled > (DxCRTC_MASTER_EN = 1), the timing generator’s vertical counter counts > by 2 for both the even fields and odd fields of each frame. For both > the even fields and the odd fields of interlaced output modes, the > “start of frame” indicator occurs 2 lines before the end of the > vertical blank occurs (DxCRTC_V_BLANK_END – 4 or DxCRTC_V_BLANK_END – > 3 depending on the field since the vertical counter counts by 2 in > interlaced), since the vertical counter counts by 2 in this mode). > There is one exception to the previous statement. If scaling is > enabled (DxSCL_SCALE_EN = 1) and the early start line is enabled > (DxCRTC_INTERLACE_START_LINE_EARLY = 1) in interlaced output mode then > the “start of frame” indicator happens 3 lines before the end of the > vertical blank (DxCRTC_V_BLANK_END – 6 or DxCRTC_V_BLANK_END – 5 > depending on the field since the vertical counter counts by 2 in > interlaced). > > When progressive output display modes are enabled > (DxCRTC_INTERLACE_ENABLE = 0) and the CRTC timing generator is enabled > (DxCRTC_MASTER_EN = 1), the “start of frame” indicator occurs 2 lines > before the end of the vertical blank occurs (DxCRTC_V_BLANK_END - 2). > Similar to interlaced output mode, there is one exception to the > previous statement. If scaling is enabled (DxSCL_SCALE_EN = 1) and > the early start line is enabled (DxCRTC_PROGRESSIVE_START_LINE_EARLY = > 1) in progressive output mode then the “start of frame” indicator > happens 3 lines before the end of the vertical blank > (DxCRTC_V_BLANK_END – 3) > > I hope this helps, > > Alex > >> >> >> If we can fix this and get it into rc2 or rc3 then we could avoid a bad >> regression and with a bit of luck at the same time improve by being able to >> set dev->vblank_disable_immediate = true then and allow vblank irqs to get >> turned off more aggressively for a bit of extra power saving. >> >> thanks, >> -mario diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c index 7f33767..4357f8f 100644 --- a/drivers/gpu/drm/radeon/evergreen.c +++ b/drivers/gpu/drm/radeon/evergreen.c @@ -4377,6 +4377,10 @@ int evergreen_rlc_resume(struct radeon_device *rdev) u32 evergreen_get_vblank_counter(struct radeon_device *rdev, int crtc) { + u32 adl = RREG32(0x6ecc + crtc_offsets[crtc]); + u32 fvr = RREG32(CRTC_STATUS_FRAME_COUNT + 4 + crtc_offsets[crtc]); + DRM_DEBUG_VBL("crtc %d: adl0: %d adl8: %d adls: %d Fd %d Vd %d\n", crtc, adl & (1 << 0), adl & (1 << 8), (adl >> 16) & 0xf, fvr & 0xffff, fvr >> 16); + if (crtc >= rdev->num_crtc) return 0; else diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c index a8d9927..76b3449 100644 --- a/drivers/gpu/drm/radeon/radeon_display.c +++ b/drivers/gpu/drm/radeon/radeon_display.c @@ -1910,11 +1910,22 @@ int radeon_get_crtc_scanoutpos(struct drm_device *dev, unsigned int pipe, vbl_start = mode->crtc_vdisplay; vbl_end = 0; } - + DRM_DEBUG_VBL("crtc %d: vpos %d : vs %d , ve %d, vs2 %d, vt %d\n", pipe, *vpos, vbl_start, vbl_end, mode->crtc_vdisplay, mode->crtc_vtotal); /* Test scanout position against vblank region. */ if ((*vpos < vbl_start) && (*vpos >= vbl_end)) in_vbl = false; + /* In vblank? */ + if (in_vbl) + ret |= DRM_SCANOUTPOS_IN_VBLANK; + + /* Called from driver internal vblank counter query code? */ + if (flags & 0x80000000) { + /* Caller wants distance from vbl_start */ + *vpos -= vbl_start; + return ret; + } + /* Check if inside vblank area and apply corrective offsets: * vpos will then be >=0 in video scanout area, but negative * within vblank area, counting down the number of lines until @@ -1930,10 +1941,6 @@ int radeon_get_crtc_scanoutpos(struct drm_device *dev, unsigned int pipe, /* Correct for shifted end of vbl at vbl_end. */ *vpos = *vpos - vbl_end; - /* In vblank? */ - if (in_vbl) - ret |= DRM_SCANOUTPOS_IN_VBLANK; - /* Is vpos outside nominal vblank area, but less than * 1/100 of a frame height away from start of vblank? * If so, assume this isn't a massively delayed vblank diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/radeon/radeon_kms.c index 0ec6fcc..885f934 100644 --- a/drivers/gpu/drm/radeon/radeon_kms.c +++ b/drivers/gpu/drm/radeon/radeon_kms.c @@ -755,6 +755,8 @@ void radeon_driver_preclose_kms(struct drm_device *dev, */ u32 radeon_get_vblank_counter_kms(struct drm_device *dev, int crtc) { + int vpos, hpos, stat, bump_line; + u32 count; struct radeon_device *rdev = dev->dev_private; if (crtc < 0 || crtc >= rdev->num_crtc) { @@ -762,7 +764,57 @@ u32 radeon_get_vblank_counter_kms(struct drm_device *dev, int crtc) return -EINVAL; } - return radeon_get_vblank_counter(rdev, crtc); + /* If called from hw vblank irq then we need hw quirk treatment. + * The hw fires its vblank irq typically 2-3 scanlines before + * the vblank, triggering sampling of our hw counter by DRM + * core before it increments at start of vblank, which is bad + * for vblank counting and timestamping, causing frequent + * off-by-one errors. Try to cook the hw count here to make it + * appear to the caller as if it was sampled after it incremented. + */ + if (rdev->mode_info.crtcs[crtc]) { + do { + count = radeon_get_vblank_counter(rdev, crtc); + /* Ask function to return vpos as distance to start of + * vblank, instead of regular vertical scanout pos. + */ + stat = radeon_get_crtc_scanoutpos( + dev, crtc, + 0x80000000, &vpos, &hpos, NULL, NULL, + &rdev->mode_info.crtcs[crtc]->base.hwmode); + } while (count != radeon_get_vblank_counter(rdev, crtc)); + + if (((stat & (DRM_SCANOUTPOS_VALID | DRM_SCANOUTPOS_ACCURATE)) != + (DRM_SCANOUTPOS_VALID | DRM_SCANOUTPOS_ACCURATE))) { + DRM_DEBUG_VBL("Query failed! stat %d\n", stat); + } + else { + /* Bump counter if we are at >= leading edge of vblank, + * but before wraparound to zero where the hw counter + * increments. Bump counter already 10 scanlines before + * vblank if we are called from vblank irq handler, as + * that one fires a few scanlines before vblank but + * should get a count as if it was called inside vblank. + */ + bump_line = in_irq() ? -10 : 0; + if (vpos >= bump_line) { + count++; + DRM_DEBUG_VBL("crtc %d: vpos %d >= bump_line %d -> Bumped.\n", + crtc, vpos, bump_line); + } + else { + DRM_DEBUG_VBL("crtc %d: vpos %d\n", + crtc, vpos); + } + } + } + else { + /* Fallback to use value as is. */ + count = radeon_get_vblank_counter(rdev, crtc); + DRM_DEBUG_VBL("NULL mode info!\n"); + } + + return count; } /** diff --git a/drivers/gpu/drm/radeon/rs600.c b/drivers/gpu/drm/radeon/rs600.c index 97a9048..460ca2f 100644 --- a/drivers/gpu/drm/radeon/rs600.c +++ b/drivers/gpu/drm/radeon/rs600.c @@ -835,6 +835,9 @@ int rs600_irq_process(struct radeon_device *rdev) u32 rs600_get_vblank_counter(struct radeon_device *rdev, int crtc) { + u32 fvr = RREG32((crtc == 0) ? (R_0060A4_D1CRTC_STATUS_FRAME_COUNT + 4) : (R_0068A4_D2CRTC_STATUS_FRAME_COUNT + 4)); + DRM_DEBUG_VBL("crtc %d: Fd%d Vd %d\n", crtc, fvr & 0xffff, fvr >> 16); + if (crtc == 0) return RREG32(R_0060A4_D1CRTC_STATUS_FRAME_COUNT); else