Message ID | 7251833ab9439f4e34ba3fb2c5daf6c9e01b6551.1573698927.git.sbobroff@linux.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC,1/1] drm/radeon: fix bad DMA from INTERRUPT_CNTL2 | expand |
On Wed, Nov 13, 2019 at 9:53 PM Sam Bobroff <sbobroff@linux.ibm.com> wrote: > > Currently, binding the radeon driver to a Radeon FirePro (PCI > 1002:68f2) that has been passed through to a guest on a Power8 machine > causes a bad DMA read which causes the device to be frozen, leading to > the driver failing to bind and other problems. > > The bad DMA is to the address written into the INTERRUPT_CNTL2 > register during r600_irq_init(), and it can be fixed by substituting a > valid (dummy) DMA address. The patch is correct. INTERRUPT_CNTL2 takes a bus address not a GPU MC address. Care to update si.c and cik.c as well? Thanks, Alex > --- > Hello, > > I've been tracking down a bug, described above, and I've been able to hack > in a workaround but I could use some help understanding how to fix it > properly. > > What seems to be happening is that when the first CRTC is enabled (when > evergreen_irq_set() writes to GRPH_INT_CONTROL) the device immediately > performs a DMA read from the address that's been programmed into > INTERRUPT_CNTL2 by r600_irq_init(). > > That address isn't a valid DMA address, so it triggers the problem. The > address comes from rdev->ih.gpu_addr, which seems to be a 'linear GPU > address', calculated from the size of the card's VRAM. It definitely hasn't > come from a DMA mapping operation. > > Based on the nearby comment, "set dummy read address to ring address", I > tried substituting a valid dummy DMA address (using the address mapped to > the 'dummy page' used by the GART) and it does prevent the problem. > However, I don't know what that register is supposed to do or what > information the device might be communicating via that DMA, so presumably > it breaks something. > > Note: I have tried loading the patched driver with "test=1" and all the > self tests succeed. > > Could anyone offer some insight into this problem? > What does that register do? What kind of address is it expecting? > What might be a good way of fixing it? > > Thanks in advance, > Sam. > > drivers/gpu/drm/radeon/r600.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c > index e937cc01910d..033bc466a862 100644 > --- a/drivers/gpu/drm/radeon/r600.c > +++ b/drivers/gpu/drm/radeon/r600.c > @@ -3696,8 +3696,8 @@ int r600_irq_init(struct radeon_device *rdev) > } > > /* setup interrupt control */ > - /* set dummy read address to ring address */ > - WREG32(INTERRUPT_CNTL2, rdev->ih.gpu_addr >> 8); > + /* set dummy read address to dummy page address */ > + WREG32(INTERRUPT_CNTL2, rdev->dummy_page.addr >> 8); > interrupt_cntl = RREG32(INTERRUPT_CNTL); > /* IH_DUMMY_RD_OVERRIDE=0 - dummy read disabled with msi, enabled without msi > * IH_DUMMY_RD_OVERRIDE=1 - dummy read controlled by IH_DUMMY_RD_EN > -- > 2.22.0.216.g00a2a96fc9 > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Thu, Nov 14, 2019 at 12:05:19AM -0500, Alex Deucher wrote: > On Wed, Nov 13, 2019 at 9:53 PM Sam Bobroff <sbobroff@linux.ibm.com> wrote: > > > > Currently, binding the radeon driver to a Radeon FirePro (PCI > > 1002:68f2) that has been passed through to a guest on a Power8 machine > > causes a bad DMA read which causes the device to be frozen, leading to > > the driver failing to bind and other problems. > > > > The bad DMA is to the address written into the INTERRUPT_CNTL2 > > register during r600_irq_init(), and it can be fixed by substituting a > > valid (dummy) DMA address. > > The patch is correct. INTERRUPT_CNTL2 takes a bus address not a GPU > MC address. Care to update si.c and cik.c as well? Sure (although I can't test those cases). I see that there's one other case in si_ih.c (although it's for amdgpu, not radeon); should I include that as well? > Thanks, > > Alex > > > --- > > Hello, > > > > I've been tracking down a bug, described above, and I've been able to hack > > in a workaround but I could use some help understanding how to fix it > > properly. > > > > What seems to be happening is that when the first CRTC is enabled (when > > evergreen_irq_set() writes to GRPH_INT_CONTROL) the device immediately > > performs a DMA read from the address that's been programmed into > > INTERRUPT_CNTL2 by r600_irq_init(). > > > > That address isn't a valid DMA address, so it triggers the problem. The > > address comes from rdev->ih.gpu_addr, which seems to be a 'linear GPU > > address', calculated from the size of the card's VRAM. It definitely hasn't > > come from a DMA mapping operation. > > > > Based on the nearby comment, "set dummy read address to ring address", I > > tried substituting a valid dummy DMA address (using the address mapped to > > the 'dummy page' used by the GART) and it does prevent the problem. > > However, I don't know what that register is supposed to do or what > > information the device might be communicating via that DMA, so presumably > > it breaks something. > > > > Note: I have tried loading the patched driver with "test=1" and all the > > self tests succeed. > > > > Could anyone offer some insight into this problem? > > What does that register do? What kind of address is it expecting? > > What might be a good way of fixing it? > > > > Thanks in advance, > > Sam. > > > > drivers/gpu/drm/radeon/r600.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c > > index e937cc01910d..033bc466a862 100644 > > --- a/drivers/gpu/drm/radeon/r600.c > > +++ b/drivers/gpu/drm/radeon/r600.c > > @@ -3696,8 +3696,8 @@ int r600_irq_init(struct radeon_device *rdev) > > } > > > > /* setup interrupt control */ > > - /* set dummy read address to ring address */ > > - WREG32(INTERRUPT_CNTL2, rdev->ih.gpu_addr >> 8); > > + /* set dummy read address to dummy page address */ > > + WREG32(INTERRUPT_CNTL2, rdev->dummy_page.addr >> 8); > > interrupt_cntl = RREG32(INTERRUPT_CNTL); > > /* IH_DUMMY_RD_OVERRIDE=0 - dummy read disabled with msi, enabled without msi > > * IH_DUMMY_RD_OVERRIDE=1 - dummy read controlled by IH_DUMMY_RD_EN > > -- > > 2.22.0.216.g00a2a96fc9 > > > > _______________________________________________ > > dri-devel mailing list > > dri-devel@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Thu, Nov 14, 2019 at 6:08 PM Sam Bobroff <sbobroff@linux.ibm.com> wrote: > > On Thu, Nov 14, 2019 at 12:05:19AM -0500, Alex Deucher wrote: > > On Wed, Nov 13, 2019 at 9:53 PM Sam Bobroff <sbobroff@linux.ibm.com> wrote: > > > > > > Currently, binding the radeon driver to a Radeon FirePro (PCI > > > 1002:68f2) that has been passed through to a guest on a Power8 machine > > > causes a bad DMA read which causes the device to be frozen, leading to > > > the driver failing to bind and other problems. > > > > > > The bad DMA is to the address written into the INTERRUPT_CNTL2 > > > register during r600_irq_init(), and it can be fixed by substituting a > > > valid (dummy) DMA address. > > > > The patch is correct. INTERRUPT_CNTL2 takes a bus address not a GPU > > MC address. Care to update si.c and cik.c as well? > > Sure (although I can't test those cases). I see that there's one other > case in si_ih.c (although it's for amdgpu, not radeon); should I include > that as well? > Yes, please. thanks, Alex > > Thanks, > > > > Alex > > > > > --- > > > Hello, > > > > > > I've been tracking down a bug, described above, and I've been able to hack > > > in a workaround but I could use some help understanding how to fix it > > > properly. > > > > > > What seems to be happening is that when the first CRTC is enabled (when > > > evergreen_irq_set() writes to GRPH_INT_CONTROL) the device immediately > > > performs a DMA read from the address that's been programmed into > > > INTERRUPT_CNTL2 by r600_irq_init(). > > > > > > That address isn't a valid DMA address, so it triggers the problem. The > > > address comes from rdev->ih.gpu_addr, which seems to be a 'linear GPU > > > address', calculated from the size of the card's VRAM. It definitely hasn't > > > come from a DMA mapping operation. > > > > > > Based on the nearby comment, "set dummy read address to ring address", I > > > tried substituting a valid dummy DMA address (using the address mapped to > > > the 'dummy page' used by the GART) and it does prevent the problem. > > > However, I don't know what that register is supposed to do or what > > > information the device might be communicating via that DMA, so presumably > > > it breaks something. > > > > > > Note: I have tried loading the patched driver with "test=1" and all the > > > self tests succeed. > > > > > > Could anyone offer some insight into this problem? > > > What does that register do? What kind of address is it expecting? > > > What might be a good way of fixing it? > > > > > > Thanks in advance, > > > Sam. > > > > > > drivers/gpu/drm/radeon/r600.c | 4 ++-- > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c > > > index e937cc01910d..033bc466a862 100644 > > > --- a/drivers/gpu/drm/radeon/r600.c > > > +++ b/drivers/gpu/drm/radeon/r600.c > > > @@ -3696,8 +3696,8 @@ int r600_irq_init(struct radeon_device *rdev) > > > } > > > > > > /* setup interrupt control */ > > > - /* set dummy read address to ring address */ > > > - WREG32(INTERRUPT_CNTL2, rdev->ih.gpu_addr >> 8); > > > + /* set dummy read address to dummy page address */ > > > + WREG32(INTERRUPT_CNTL2, rdev->dummy_page.addr >> 8); > > > interrupt_cntl = RREG32(INTERRUPT_CNTL); > > > /* IH_DUMMY_RD_OVERRIDE=0 - dummy read disabled with msi, enabled without msi > > > * IH_DUMMY_RD_OVERRIDE=1 - dummy read controlled by IH_DUMMY_RD_EN > > > -- > > > 2.22.0.216.g00a2a96fc9 > > > > > > _______________________________________________ > > > dri-devel mailing list > > > dri-devel@lists.freedesktop.org > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c index e937cc01910d..033bc466a862 100644 --- a/drivers/gpu/drm/radeon/r600.c +++ b/drivers/gpu/drm/radeon/r600.c @@ -3696,8 +3696,8 @@ int r600_irq_init(struct radeon_device *rdev) } /* setup interrupt control */ - /* set dummy read address to ring address */ - WREG32(INTERRUPT_CNTL2, rdev->ih.gpu_addr >> 8); + /* set dummy read address to dummy page address */ + WREG32(INTERRUPT_CNTL2, rdev->dummy_page.addr >> 8); interrupt_cntl = RREG32(INTERRUPT_CNTL); /* IH_DUMMY_RD_OVERRIDE=0 - dummy read disabled with msi, enabled without msi * IH_DUMMY_RD_OVERRIDE=1 - dummy read controlled by IH_DUMMY_RD_EN