mbox series

[0/8] Convert the intel iommu driver to the dma-iommu api

Message ID 20191221150402.13868-1-murphyt7@tcd.ie (mailing list archive)
Headers show
Series Convert the intel iommu driver to the dma-iommu api | expand

Message

Tom Murphy Dec. 21, 2019, 3:03 p.m. UTC
This patchset converts the intel iommu driver to the dma-iommu api.

While converting the driver I exposed a bug in the intel i915 driver which causes a huge amount of artifacts on the screen of my laptop. You can see a picture of it here:
https://github.com/pippy360/kernelPatches/blob/master/IMG_20191219_225922.jpg

This issue is most likely in the i915 driver and is most likely caused by the driver not respecting the return value of the dma_map_ops::map_sg function. You can see the driver ignoring the return value here:
https://github.com/torvalds/linux/blob/7e0165b2f1a912a06e381e91f0f4e495f4ac3736/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c#L51

Previously this didn’t cause issues because the intel map_sg always returned the same number of elements as the input scatter gather list but with the change to this dma-iommu api this is no longer the case. I wasn’t able to track the bug down to a specific line of code unfortunately.  

Could someone from the intel team look at this?


I have been testing on a lenovo x1 carbon 5th generation. Let me know if there’s any more information you need.

To allow my patch set to be tested I have added a patch (patch 8/8) in this series to disable combining sg segments in the dma-iommu api which fixes the bug but it doesn't fix the actual problem.

As part of this patch series I copied the intel bounce buffer code to the dma-iommu path. The addition of the bounce buffer code took me by surprise. I did most of my development on this patch series before the bounce buffer code was added and my reimplementation in the dma-iommu path is very rushed and not properly tested but I’m running out of time to work on this patch set.

On top of that I also didn’t port over the intel tracing code from this commit:
https://github.com/torvalds/linux/commit/3b53034c268d550d9e8522e613a14ab53b8840d8#diff-6b3e7c4993f05e76331e463ab1fc87e1
So all the work in that commit is now wasted. The code will need to be removed and reimplemented in the dma-iommu path. I would like to take the time to do this but I really don’t have the time at the moment and I want to get these changes out before the iommu code changes any more.


Tom Murphy (8):
  iommu/vt-d: clean up 32bit si_domain assignment
  iommu/vt-d: Use default dma_direct_* mapping functions for direct
    mapped devices
  iommu/vt-d: Remove IOVA handling code from non-dma_ops path
  iommu: Handle freelists when using deferred flushing in iommu drivers
  iommu: Add iommu_dma_free_cpu_cached_iovas function
  iommu: allow the dma-iommu api to use bounce buffers
  iommu/vt-d: Convert intel iommu driver to the iommu ops
  DO NOT MERGE: iommu: disable list appending in dma-iommu

 drivers/iommu/Kconfig           |   1 +
 drivers/iommu/amd_iommu.c       |  14 +-
 drivers/iommu/arm-smmu-v3.c     |   3 +-
 drivers/iommu/arm-smmu.c        |   3 +-
 drivers/iommu/dma-iommu.c       | 183 +++++--
 drivers/iommu/exynos-iommu.c    |   3 +-
 drivers/iommu/intel-iommu.c     | 936 ++++----------------------------
 drivers/iommu/iommu.c           |  39 +-
 drivers/iommu/ipmmu-vmsa.c      |   3 +-
 drivers/iommu/msm_iommu.c       |   3 +-
 drivers/iommu/mtk_iommu.c       |   3 +-
 drivers/iommu/mtk_iommu_v1.c    |   3 +-
 drivers/iommu/omap-iommu.c      |   3 +-
 drivers/iommu/qcom_iommu.c      |   3 +-
 drivers/iommu/rockchip-iommu.c  |   3 +-
 drivers/iommu/s390-iommu.c      |   3 +-
 drivers/iommu/tegra-gart.c      |   3 +-
 drivers/iommu/tegra-smmu.c      |   3 +-
 drivers/iommu/virtio-iommu.c    |   3 +-
 drivers/vfio/vfio_iommu_type1.c |   2 +-
 include/linux/dma-iommu.h       |   3 +
 include/linux/intel-iommu.h     |   1 -
 include/linux/iommu.h           |  32 +-
 23 files changed, 345 insertions(+), 908 deletions(-)

Comments

Jani Nikula Dec. 23, 2019, 10:37 a.m. UTC | #1
On Sat, 21 Dec 2019, Tom Murphy <murphyt7@tcd.ie> wrote:
> This patchset converts the intel iommu driver to the dma-iommu api.
>
> While converting the driver I exposed a bug in the intel i915 driver
> which causes a huge amount of artifacts on the screen of my
> laptop. You can see a picture of it here:
> https://github.com/pippy360/kernelPatches/blob/master/IMG_20191219_225922.jpg
>
> This issue is most likely in the i915 driver and is most likely caused
> by the driver not respecting the return value of the
> dma_map_ops::map_sg function. You can see the driver ignoring the
> return value here:
> https://github.com/torvalds/linux/blob/7e0165b2f1a912a06e381e91f0f4e495f4ac3736/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c#L51
>
> Previously this didn’t cause issues because the intel map_sg always
> returned the same number of elements as the input scatter gather list
> but with the change to this dma-iommu api this is no longer the
> case. I wasn’t able to track the bug down to a specific line of code
> unfortunately.
>
> Could someone from the intel team look at this?

Let me get this straight. There is current API that on success always
returns the same number of elements as the input scatter gather
list. You propose to change the API so that this is no longer the case?

A quick check of various dma_map_sg() calls in the kernel seems to
indicate checking for 0 for errors and then ignoring the non-zero return
is a common pattern. Are you sure it's okay to make the change you're
proposing?

Anyway, due to the time of year and all, I'd like to ask you to file a
bug against i915 at [1] so this is not forgotten, and please let's not
merge the changes before this is resolved.


Thanks,
Jani.


[1] https://gitlab.freedesktop.org/drm/intel/issues/new
Robin Murphy Dec. 23, 2019, 11:29 a.m. UTC | #2
On 2019-12-23 10:37 am, Jani Nikula wrote:
> On Sat, 21 Dec 2019, Tom Murphy <murphyt7@tcd.ie> wrote:
>> This patchset converts the intel iommu driver to the dma-iommu api.
>>
>> While converting the driver I exposed a bug in the intel i915 driver
>> which causes a huge amount of artifacts on the screen of my
>> laptop. You can see a picture of it here:
>> https://github.com/pippy360/kernelPatches/blob/master/IMG_20191219_225922.jpg
>>
>> This issue is most likely in the i915 driver and is most likely caused
>> by the driver not respecting the return value of the
>> dma_map_ops::map_sg function. You can see the driver ignoring the
>> return value here:
>> https://github.com/torvalds/linux/blob/7e0165b2f1a912a06e381e91f0f4e495f4ac3736/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c#L51
>>
>> Previously this didn’t cause issues because the intel map_sg always
>> returned the same number of elements as the input scatter gather list
>> but with the change to this dma-iommu api this is no longer the
>> case. I wasn’t able to track the bug down to a specific line of code
>> unfortunately.
>>
>> Could someone from the intel team look at this?
> 
> Let me get this straight. There is current API that on success always
> returns the same number of elements as the input scatter gather
> list. You propose to change the API so that this is no longer the case?

No, the API for dma_map_sg() has always been that it may return fewer 
DMA segments than nents - see Documentation/DMA-API.txt (and otherwise, 
the return value would surely be a simple success/fail condition). 
Relying on a particular implementation behaviour has never been strictly 
correct, even if it does happen to be a very common behaviour.

> A quick check of various dma_map_sg() calls in the kernel seems to
> indicate checking for 0 for errors and then ignoring the non-zero return
> is a common pattern. Are you sure it's okay to make the change you're
> proposing?

Various code uses tricks like just iterating the mapped list until the 
first segment with zero sg_dma_len(). Others may well simply have bugs.

Robin.

> Anyway, due to the time of year and all, I'd like to ask you to file a
> bug against i915 at [1] so this is not forgotten, and please let's not
> merge the changes before this is resolved.
> 
> 
> Thanks,
> Jani.
> 
> 
> [1] https://gitlab.freedesktop.org/drm/intel/issues/new
> 
>
Jani Nikula Dec. 23, 2019, 11:41 a.m. UTC | #3
On Mon, 23 Dec 2019, Robin Murphy <robin.murphy@arm.com> wrote:
> On 2019-12-23 10:37 am, Jani Nikula wrote:
>> On Sat, 21 Dec 2019, Tom Murphy <murphyt7@tcd.ie> wrote:
>>> This patchset converts the intel iommu driver to the dma-iommu api.
>>>
>>> While converting the driver I exposed a bug in the intel i915 driver
>>> which causes a huge amount of artifacts on the screen of my
>>> laptop. You can see a picture of it here:
>>> https://github.com/pippy360/kernelPatches/blob/master/IMG_20191219_225922.jpg
>>>
>>> This issue is most likely in the i915 driver and is most likely caused
>>> by the driver not respecting the return value of the
>>> dma_map_ops::map_sg function. You can see the driver ignoring the
>>> return value here:
>>> https://github.com/torvalds/linux/blob/7e0165b2f1a912a06e381e91f0f4e495f4ac3736/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c#L51
>>>
>>> Previously this didn’t cause issues because the intel map_sg always
>>> returned the same number of elements as the input scatter gather list
>>> but with the change to this dma-iommu api this is no longer the
>>> case. I wasn’t able to track the bug down to a specific line of code
>>> unfortunately.
>>>
>>> Could someone from the intel team look at this?
>> 
>> Let me get this straight. There is current API that on success always
>> returns the same number of elements as the input scatter gather
>> list. You propose to change the API so that this is no longer the case?
>
> No, the API for dma_map_sg() has always been that it may return fewer 
> DMA segments than nents - see Documentation/DMA-API.txt (and otherwise, 
> the return value would surely be a simple success/fail condition). 
> Relying on a particular implementation behaviour has never been strictly 
> correct, even if it does happen to be a very common behaviour.
>
>> A quick check of various dma_map_sg() calls in the kernel seems to
>> indicate checking for 0 for errors and then ignoring the non-zero return
>> is a common pattern. Are you sure it's okay to make the change you're
>> proposing?
>
> Various code uses tricks like just iterating the mapped list until the 
> first segment with zero sg_dma_len(). Others may well simply have bugs.

Thanks for the clarification.

BR,
Jani.

>
> Robin.
>
>> Anyway, due to the time of year and all, I'd like to ask you to file a
>> bug against i915 at [1] so this is not forgotten, and please let's not
>> merge the changes before this is resolved.
>> 
>> 
>> Thanks,
>> Jani.
>> 
>> 
>> [1] https://gitlab.freedesktop.org/drm/intel/issues/new
>> 
>>
Tom Murphy March 20, 2020, 6:28 a.m. UTC | #4
Any news on this? Is there anyone who wants to try and fix this possible bug?

On Mon, 23 Dec 2019 at 03:41, Jani Nikula <jani.nikula@linux.intel.com> wrote:
>
> On Mon, 23 Dec 2019, Robin Murphy <robin.murphy@arm.com> wrote:
> > On 2019-12-23 10:37 am, Jani Nikula wrote:
> >> On Sat, 21 Dec 2019, Tom Murphy <murphyt7@tcd.ie> wrote:
> >>> This patchset converts the intel iommu driver to the dma-iommu api.
> >>>
> >>> While converting the driver I exposed a bug in the intel i915 driver
> >>> which causes a huge amount of artifacts on the screen of my
> >>> laptop. You can see a picture of it here:
> >>> https://github.com/pippy360/kernelPatches/blob/master/IMG_20191219_225922.jpg
> >>>
> >>> This issue is most likely in the i915 driver and is most likely caused
> >>> by the driver not respecting the return value of the
> >>> dma_map_ops::map_sg function. You can see the driver ignoring the
> >>> return value here:
> >>> https://github.com/torvalds/linux/blob/7e0165b2f1a912a06e381e91f0f4e495f4ac3736/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c#L51
> >>>
> >>> Previously this didn’t cause issues because the intel map_sg always
> >>> returned the same number of elements as the input scatter gather list
> >>> but with the change to this dma-iommu api this is no longer the
> >>> case. I wasn’t able to track the bug down to a specific line of code
> >>> unfortunately.
> >>>
> >>> Could someone from the intel team look at this?
> >>
> >> Let me get this straight. There is current API that on success always
> >> returns the same number of elements as the input scatter gather
> >> list. You propose to change the API so that this is no longer the case?
> >
> > No, the API for dma_map_sg() has always been that it may return fewer
> > DMA segments than nents - see Documentation/DMA-API.txt (and otherwise,
> > the return value would surely be a simple success/fail condition).
> > Relying on a particular implementation behaviour has never been strictly
> > correct, even if it does happen to be a very common behaviour.
> >
> >> A quick check of various dma_map_sg() calls in the kernel seems to
> >> indicate checking for 0 for errors and then ignoring the non-zero return
> >> is a common pattern. Are you sure it's okay to make the change you're
> >> proposing?
> >
> > Various code uses tricks like just iterating the mapped list until the
> > first segment with zero sg_dma_len(). Others may well simply have bugs.
>
> Thanks for the clarification.
>
> BR,
> Jani.
>
> >
> > Robin.
> >
> >> Anyway, due to the time of year and all, I'd like to ask you to file a
> >> bug against i915 at [1] so this is not forgotten, and please let's not
> >> merge the changes before this is resolved.
> >>
> >>
> >> Thanks,
> >> Jani.
> >>
> >>
> >> [1] https://gitlab.freedesktop.org/drm/intel/issues/new
> >>
> >>
>
> --
> Jani Nikula, Intel Open Source Graphics Center
Logan Gunthorpe May 29, 2020, midnight UTC | #5
Hi Tom,

On 2019-12-21 8:03 a.m., Tom Murphy wrote:
> This patchset converts the intel iommu driver to the dma-iommu api.

Just wanted to note that I've rebased your series on recent kernels and
have done some testing on my old Sandybridge machine (without the DO NOT
MERGE patch) and have found no issues. I hope this can make progress
soon and get merged soon. If you like you can add:

Tested-By: Logan Gunthorpe <logang@deltatee.com>

> While converting the driver I exposed a bug in the intel i915 driver which causes a huge amount of artifacts on the screen of my laptop. You can see a picture of it here:
> https://github.com/pippy360/kernelPatches/blob/master/IMG_20191219_225922.jpg
> 
> This issue is most likely in the i915 driver and is most likely caused by the driver not respecting the return value of the dma_map_ops::map_sg function. You can see the driver ignoring the return value here:
> https://github.com/torvalds/linux/blob/7e0165b2f1a912a06e381e91f0f4e495f4ac3736/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c#L51
> 
> Previously this didn’t cause issues because the intel map_sg always returned the same number of elements as the input scatter gather list but with the change to this dma-iommu api this is no longer the case. I wasn’t able to track the bug down to a specific line of code unfortunately.  

I did some digging into this myself and while I don't have full patch, I
think I traced it closer to the problem.

Sadly, ignoring the number of nents returned by map_sg() is endemic to
dma-buf users, but AMD's GPU driver seems to do the same thing,
presumably without issues.

Digging a bit further, I found that the i915 has an "innovative" way of
iterating through SGLs, see [1]. I suspect if __sgt_iter is changed to
increment with sg_dma_len() and return NULL when there is no length
left, it may fix the issue.

But, sorry, I don't really have the means or time to fix and test this
myself.

Thanks,

Logan

[1]
https://elixir.bootlin.com/linux/v5.7-rc7/source/drivers/gpu/drm/i915/i915_scatterlist.h#L76
Christoph Hellwig May 29, 2020, 12:45 p.m. UTC | #6
On Thu, May 28, 2020 at 06:00:44PM -0600, Logan Gunthorpe wrote:
> > This issue is most likely in the i915 driver and is most likely caused by the driver not respecting the return value of the dma_map_ops::map_sg function. You can see the driver ignoring the return value here:
> > https://github.com/torvalds/linux/blob/7e0165b2f1a912a06e381e91f0f4e495f4ac3736/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c#L51
> > 
> > Previously this didn’t cause issues because the intel map_sg always returned the same number of elements as the input scatter gather list but with the change to this dma-iommu api this is no longer the case. I wasn’t able to track the bug down to a specific line of code unfortunately.  

Mark did a big audit into the map_sg API abuse and initially had
some i915 patches, but then gave up on them with this comment:

"The biggest TODO is DRM/i915 driver and I don't feel brave enough to fix
 it fully. The driver creatively uses sg_table->orig_nents to store the
 size of the allocate scatterlist and ignores the number of the entries
 returned by dma_map_sg function. In this patchset I only fixed the
 sg_table objects exported by dmabuf related functions. I hope that I
 didn't break anything there."

it would be really nice if the i915 maintainers could help with sorting
that API abuse out.
Logan Gunthorpe May 29, 2020, 7:05 p.m. UTC | #7
On 2020-05-29 6:45 a.m., Christoph Hellwig wrote:
> On Thu, May 28, 2020 at 06:00:44PM -0600, Logan Gunthorpe wrote:
>>> This issue is most likely in the i915 driver and is most likely caused by the driver not respecting the return value of the dma_map_ops::map_sg function. You can see the driver ignoring the return value here:
>>> https://github.com/torvalds/linux/blob/7e0165b2f1a912a06e381e91f0f4e495f4ac3736/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c#L51
>>>
>>> Previously this didn’t cause issues because the intel map_sg always returned the same number of elements as the input scatter gather list but with the change to this dma-iommu api this is no longer the case. I wasn’t able to track the bug down to a specific line of code unfortunately.  
> 
> Mark did a big audit into the map_sg API abuse and initially had
> some i915 patches, but then gave up on them with this comment:
> 
> "The biggest TODO is DRM/i915 driver and I don't feel brave enough to fix
>  it fully. The driver creatively uses sg_table->orig_nents to store the
>  size of the allocate scatterlist and ignores the number of the entries
>  returned by dma_map_sg function. In this patchset I only fixed the
>  sg_table objects exported by dmabuf related functions. I hope that I
>  didn't break anything there."
> 
> it would be really nice if the i915 maintainers could help with sorting
> that API abuse out.
> 

I agree completely that the API abuse should be sorted out, but I think
that's much larger than just the i915 driver. Pretty much every dma-buf
map_dma_buf implementation I looked at ignores the returned nents of
sg_attrs. This sucks, but I don't think it's the bug Tom ran into. See:

amdgpu_dma_buf_map
armada_gem_prime_map_dma_buf
drm_gem_map_dma_buf
i915_gem_map_dma_buf
tegra_gem_prime_map_dma_buf

So this should probably be addressed by the whole GPU community.

However, as Robin pointed out, there are other ugly tricks like stopping
iterating through the SGL when sg_dma_len() is zero. For example, the
AMD driver appears to use drm_prime_sg_to_page_addr_arrays() which does
this trick and thus likely isn't buggy (otherwise, I'd expect someone to
have complained by now seeing AMD has already switched to IOMMU-DMA.

As I tried to point out in my previous email, i915 does not do this
trick. In fact, it completely ignores sg_dma_len() and is thus
completely broken. See i915_scatterlist.h and the __sgt_iter() function.
So it doesn't sound to me like Mark's fix would address the issue at
all. Per my previous email, I'd guess that it can be fixed simply by
adjusting the __sgt_iter() function to do something more sensible.
(Better yet, if possible, ditch __sgt_iter() and use the common DRM
function that AMD uses).

This would at least allow us to make progress with Tom's IOMMU-DMA patch
set and once that gets in, it will be harder for other drivers to make
the same mistake.

Logan
Marek Szyprowski May 29, 2020, 9:11 p.m. UTC | #8
Hi Logan,

On 29.05.2020 21:05, Logan Gunthorpe wrote:
> On 2020-05-29 6:45 a.m., Christoph Hellwig wrote:
>> On Thu, May 28, 2020 at 06:00:44PM -0600, Logan Gunthorpe wrote:
>>>> This issue is most likely in the i915 driver and is most likely caused by the driver not respecting the return value of the dma_map_ops::map_sg function. You can see the driver ignoring the return value here:
>>>> https://protect2.fireeye.com/url?k=ca25a34b-97f7b813-ca242804-0cc47a31c8b4-0ecdffc9f56851e1&q=1&u=https%3A%2F%2Fgithub.com%2Ftorvalds%2Flinux%2Fblob%2F7e0165b2f1a912a06e381e91f0f4e495f4ac3736%2Fdrivers%2Fgpu%2Fdrm%2Fi915%2Fgem%2Fi915_gem_dmabuf.c%23L51
>>>>
>>>> Previously this didn’t cause issues because the intel map_sg always returned the same number of elements as the input scatter gather list but with the change to this dma-iommu api this is no longer the case. I wasn’t able to track the bug down to a specific line of code unfortunately.
>> Mark did a big audit into the map_sg API abuse and initially had
>> some i915 patches, but then gave up on them with this comment:
>>
>> "The biggest TODO is DRM/i915 driver and I don't feel brave enough to fix
>>   it fully. The driver creatively uses sg_table->orig_nents to store the
>>   size of the allocate scatterlist and ignores the number of the entries
>>   returned by dma_map_sg function. In this patchset I only fixed the
>>   sg_table objects exported by dmabuf related functions. I hope that I
>>   didn't break anything there."
>>
>> it would be really nice if the i915 maintainers could help with sorting
>> that API abuse out.
>>
> I agree completely that the API abuse should be sorted out, but I think
> that's much larger than just the i915 driver. Pretty much every dma-buf
> map_dma_buf implementation I looked at ignores the returned nents of
> sg_attrs. This sucks, but I don't think it's the bug Tom ran into. See:
>
> amdgpu_dma_buf_map
> armada_gem_prime_map_dma_buf
> drm_gem_map_dma_buf
> i915_gem_map_dma_buf
> tegra_gem_prime_map_dma_buf
>
> So this should probably be addressed by the whole GPU community.

Patches are pending:
https://lore.kernel.org/linux-iommu/20200513132114.6046-1-m.szyprowski@samsung.com/T/

> However, as Robin pointed out, there are other ugly tricks like stopping
> iterating through the SGL when sg_dma_len() is zero. For example, the
> AMD driver appears to use drm_prime_sg_to_page_addr_arrays() which does
> this trick and thus likely isn't buggy (otherwise, I'd expect someone to
> have complained by now seeing AMD has already switched to IOMMU-DMA.

I'm not sure that this is a trick. Stopping at zero sg_dma_len() was 
somewhere documented.

> As I tried to point out in my previous email, i915 does not do this
> trick. In fact, it completely ignores sg_dma_len() and is thus
> completely broken. See i915_scatterlist.h and the __sgt_iter() function.
> So it doesn't sound to me like Mark's fix would address the issue at
> all. Per my previous email, I'd guess that it can be fixed simply by
> adjusting the __sgt_iter() function to do something more sensible.
> (Better yet, if possible, ditch __sgt_iter() and use the common DRM
> function that AMD uses).
>
> This would at least allow us to make progress with Tom's IOMMU-DMA patch
> set and once that gets in, it will be harder for other drivers to make
> the same mistake.

Best regards
Logan Gunthorpe May 29, 2020, 9:21 p.m. UTC | #9
On 2020-05-29 3:11 p.m., Marek Szyprowski wrote:
> Patches are pending:
> https://lore.kernel.org/linux-iommu/20200513132114.6046-1-m.szyprowski@samsung.com/T/

Cool, nice! Though, I still don't think that fixes the issue in
i915_scatterlist.h given it still ignores sg_dma_len() and strictly
relies on sg_next()/sg_is_last() to stop iterating -- and I suspect this
is the bug that got in Tom's way.

>> However, as Robin pointed out, there are other ugly tricks like stopping
>> iterating through the SGL when sg_dma_len() is zero. For example, the
>> AMD driver appears to use drm_prime_sg_to_page_addr_arrays() which does
>> this trick and thus likely isn't buggy (otherwise, I'd expect someone to
>> have complained by now seeing AMD has already switched to IOMMU-DMA.
> 
> I'm not sure that this is a trick. Stopping at zero sg_dma_len() was 
> somewhere documented.

Well whatever you want to call it, it is ugly to have some drivers doing
one thing with the returned value and others assuming there's an extra
zero at the end. It just causes confusion for people reading/copying the
code. It would be better if they are all consistent. However, I concede
stopping at zero should not be broken, presently.

Logan
Tom Murphy Aug. 24, 2020, 12:04 a.m. UTC | #10
Hi Logan/All,

I have added a check for the sg_dma_len == 0 :
"""
 } __sgt_iter(struct scatterlist *sgl, bool dma) {
        struct sgt_iter s = { .sgp = sgl };

+       if (sgl && sg_dma_len(sgl) == 0)
+           s.sgp = NULL;

        if (s.sgp) {
            .....
"""
at location [1].
but it doens't fix the problem.

You're right though, this change does need to be made, this code
doesn't handle pages of sg_dma_len(sg) == 0 correctly
So my guess is that we have more bugs in other parts of the i915
driver (or there is a problem with my "sg_dma_len == 0" fix above).
I have been trying to spot where else the code might be buggy but I
haven't had any luck so far.

I'm doing a microconfernce (at LPC 2020) this wednesdays [1] on this
if you're interested in attending.
I'm hoping I can chat about it with a few people and find how can
reproduce and fix this issues. I don't have any more time I can give
to this unfortunately and it would be a shame for the work to go to
waste.

[0] https://github.com/torvalds/linux/blob/d012a7190fc1fd72ed48911e77ca97ba4521bccd/drivers/gpu/drm/i915/i915_scatterlist.h#L28
[1] https://linuxplumbersconf.org/event/7/contributions/846/

On Fri, 29 May 2020 at 22:21, Logan Gunthorpe <logang@deltatee.com> wrote:
>
>
>
> On 2020-05-29 3:11 p.m., Marek Szyprowski wrote:
> > Patches are pending:
> > https://lore.kernel.org/linux-iommu/20200513132114.6046-1-m.szyprowski@samsung.com/T/
>
> Cool, nice! Though, I still don't think that fixes the issue in
> i915_scatterlist.h given it still ignores sg_dma_len() and strictly
> relies on sg_next()/sg_is_last() to stop iterating -- and I suspect this
> is the bug that got in Tom's way.
>
> >> However, as Robin pointed out, there are other ugly tricks like stopping
> >> iterating through the SGL when sg_dma_len() is zero. For example, the
> >> AMD driver appears to use drm_prime_sg_to_page_addr_arrays() which does
> >> this trick and thus likely isn't buggy (otherwise, I'd expect someone to
> >> have complained by now seeing AMD has already switched to IOMMU-DMA.
> >
> > I'm not sure that this is a trick. Stopping at zero sg_dma_len() was
> > somewhere documented.
>
> Well whatever you want to call it, it is ugly to have some drivers doing
> one thing with the returned value and others assuming there's an extra
> zero at the end. It just causes confusion for people reading/copying the
> code. It would be better if they are all consistent. However, I concede
> stopping at zero should not be broken, presently.
>
> Logan
Robin Murphy Aug. 26, 2020, 6:14 p.m. UTC | #11
Hi Tom,

On 2019-12-21 15:03, Tom Murphy wrote:
> This patchset converts the intel iommu driver to the dma-iommu api.
> 
> While converting the driver I exposed a bug in the intel i915 driver which causes a huge amount of artifacts on the screen of my laptop. You can see a picture of it here:
> https://github.com/pippy360/kernelPatches/blob/master/IMG_20191219_225922.jpg
> 
> This issue is most likely in the i915 driver and is most likely caused by the driver not respecting the return value of the dma_map_ops::map_sg function. You can see the driver ignoring the return value here:
> https://github.com/torvalds/linux/blob/7e0165b2f1a912a06e381e91f0f4e495f4ac3736/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c#L51
> 
> Previously this didn’t cause issues because the intel map_sg always returned the same number of elements as the input scatter gather list but with the change to this dma-iommu api this is no longer the case. I wasn’t able to track the bug down to a specific line of code unfortunately.
> 
> Could someone from the intel team look at this?
> 
> 
> I have been testing on a lenovo x1 carbon 5th generation. Let me know if there’s any more information you need.
> 
> To allow my patch set to be tested I have added a patch (patch 8/8) in this series to disable combining sg segments in the dma-iommu api which fixes the bug but it doesn't fix the actual problem.
> 
> As part of this patch series I copied the intel bounce buffer code to the dma-iommu path. The addition of the bounce buffer code took me by surprise. I did most of my development on this patch series before the bounce buffer code was added and my reimplementation in the dma-iommu path is very rushed and not properly tested but I’m running out of time to work on this patch set.
> 
> On top of that I also didn’t port over the intel tracing code from this commit:
> https://github.com/torvalds/linux/commit/3b53034c268d550d9e8522e613a14ab53b8840d8#diff-6b3e7c4993f05e76331e463ab1fc87e1
> So all the work in that commit is now wasted. The code will need to be removed and reimplemented in the dma-iommu path. I would like to take the time to do this but I really don’t have the time at the moment and I want to get these changes out before the iommu code changes any more.

Further to what we just discussed at LPC, I've realised that tracepoints 
are actually something I could do with *right now* for debugging my Arm 
DMA ops series, so if I'm going to hack something up anyway I may as 
well take responsibility for polishing it into a proper patch as well :)

Robin.

> 
> Tom Murphy (8):
>    iommu/vt-d: clean up 32bit si_domain assignment
>    iommu/vt-d: Use default dma_direct_* mapping functions for direct
>      mapped devices
>    iommu/vt-d: Remove IOVA handling code from non-dma_ops path
>    iommu: Handle freelists when using deferred flushing in iommu drivers
>    iommu: Add iommu_dma_free_cpu_cached_iovas function
>    iommu: allow the dma-iommu api to use bounce buffers
>    iommu/vt-d: Convert intel iommu driver to the iommu ops
>    DO NOT MERGE: iommu: disable list appending in dma-iommu
> 
>   drivers/iommu/Kconfig           |   1 +
>   drivers/iommu/amd_iommu.c       |  14 +-
>   drivers/iommu/arm-smmu-v3.c     |   3 +-
>   drivers/iommu/arm-smmu.c        |   3 +-
>   drivers/iommu/dma-iommu.c       | 183 +++++--
>   drivers/iommu/exynos-iommu.c    |   3 +-
>   drivers/iommu/intel-iommu.c     | 936 ++++----------------------------
>   drivers/iommu/iommu.c           |  39 +-
>   drivers/iommu/ipmmu-vmsa.c      |   3 +-
>   drivers/iommu/msm_iommu.c       |   3 +-
>   drivers/iommu/mtk_iommu.c       |   3 +-
>   drivers/iommu/mtk_iommu_v1.c    |   3 +-
>   drivers/iommu/omap-iommu.c      |   3 +-
>   drivers/iommu/qcom_iommu.c      |   3 +-
>   drivers/iommu/rockchip-iommu.c  |   3 +-
>   drivers/iommu/s390-iommu.c      |   3 +-
>   drivers/iommu/tegra-gart.c      |   3 +-
>   drivers/iommu/tegra-smmu.c      |   3 +-
>   drivers/iommu/virtio-iommu.c    |   3 +-
>   drivers/vfio/vfio_iommu_type1.c |   2 +-
>   include/linux/dma-iommu.h       |   3 +
>   include/linux/intel-iommu.h     |   1 -
>   include/linux/iommu.h           |  32 +-
>   23 files changed, 345 insertions(+), 908 deletions(-)
>
Alex Deucher Aug. 26, 2020, 6:26 p.m. UTC | #12
On Mon, Aug 24, 2020 at 2:56 AM Tom Murphy <murphyt7@tcd.ie> wrote:
>
> Hi Logan/All,
>
> I have added a check for the sg_dma_len == 0 :
> """
>  } __sgt_iter(struct scatterlist *sgl, bool dma) {
>         struct sgt_iter s = { .sgp = sgl };
>
> +       if (sgl && sg_dma_len(sgl) == 0)
> +           s.sgp = NULL;
>
>         if (s.sgp) {
>             .....
> """
> at location [1].
> but it doens't fix the problem.
>
> You're right though, this change does need to be made, this code
> doesn't handle pages of sg_dma_len(sg) == 0 correctly
> So my guess is that we have more bugs in other parts of the i915
> driver (or there is a problem with my "sg_dma_len == 0" fix above).
> I have been trying to spot where else the code might be buggy but I
> haven't had any luck so far.
>
> I'm doing a microconfernce (at LPC 2020) this wednesdays [1] on this
> if you're interested in attending.
> I'm hoping I can chat about it with a few people and find how can
> reproduce and fix this issues. I don't have any more time I can give
> to this unfortunately and it would be a shame for the work to go to
> waste.
>
> [0] https://github.com/torvalds/linux/blob/d012a7190fc1fd72ed48911e77ca97ba4521bccd/drivers/gpu/drm/i915/i915_scatterlist.h#L28
> [1] https://linuxplumbersconf.org/event/7/contributions/846/
>
> On Fri, 29 May 2020 at 22:21, Logan Gunthorpe <logang@deltatee.com> wrote:
> >
> >
> >
> > On 2020-05-29 3:11 p.m., Marek Szyprowski wrote:
> > > Patches are pending:
> > > https://lore.kernel.org/linux-iommu/20200513132114.6046-1-m.szyprowski@samsung.com/T/
> >
> > Cool, nice! Though, I still don't think that fixes the issue in
> > i915_scatterlist.h given it still ignores sg_dma_len() and strictly
> > relies on sg_next()/sg_is_last() to stop iterating -- and I suspect this
> > is the bug that got in Tom's way.
> >
> > >> However, as Robin pointed out, there are other ugly tricks like stopping
> > >> iterating through the SGL when sg_dma_len() is zero. For example, the
> > >> AMD driver appears to use drm_prime_sg_to_page_addr_arrays() which does
> > >> this trick and thus likely isn't buggy (otherwise, I'd expect someone to
> > >> have complained by now seeing AMD has already switched to IOMMU-DMA.

We ran into the same issue with amdgpu and radeon when the AMD IOMMU
driver was converted and had to fix it as well.  The relevant fixes
were:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=42e67b479eab6d26459b80b4867298232b0435e7
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0199172f933342d8b1011aae2054a695c25726f4
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=47f7826c520ecd92ffbffe59ecaa2fe61e42ec70
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c0f83d164fb8f3a2b7bc379a6c1e27d1123a9eab

Alex

> > >
> > > I'm not sure that this is a trick. Stopping at zero sg_dma_len() was
> > > somewhere documented.
> >
> > Well whatever you want to call it, it is ugly to have some drivers doing
> > one thing with the returned value and others assuming there's an extra
> > zero at the end. It just causes confusion for people reading/copying the
> > code. It would be better if they are all consistent. However, I concede
> > stopping at zero should not be broken, presently.
> >
> > Logan
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Logan Gunthorpe Aug. 27, 2020, 9:36 p.m. UTC | #13
On 2020-08-23 6:04 p.m., Tom Murphy wrote:
> I have added a check for the sg_dma_len == 0 :
> """
>  } __sgt_iter(struct scatterlist *sgl, bool dma) {
>         struct sgt_iter s = { .sgp = sgl };
> 
> +       if (sgl && sg_dma_len(sgl) == 0)
> +           s.sgp = NULL;
> 
>         if (s.sgp) {
>             .....
> """
> at location [1].
> but it doens't fix the problem.

Based on my read of the code, it looks like we also need to change usage
of sgl->length... Something like the rough patch below, maybe?

Also, Tom, do you have an updated version of the patchset to convert the
Intel IOMMU to dma-iommu available? The last one I've found doesn't
apply cleanly (I'm assuming parts of it have been merged in slightly
modified forms).

Thanks,

Logan

--

diff --git a/drivers/gpu/drm/i915/i915_scatterlist.h
b/drivers/gpu/drm/i915/i915
index b7b59328cb76..9367ac801f0c 100644
--- a/drivers/gpu/drm/i915/i915_scatterlist.h
+++ b/drivers/gpu/drm/i915/i915_scatterlist.h
@@ -27,13 +27,19 @@ static __always_inline struct sgt_iter {
 } __sgt_iter(struct scatterlist *sgl, bool dma) {
        struct sgt_iter s = { .sgp = sgl };

+       if (sgl && !sg_dma_len(s.sgp))
+               s.sgp = NULL;
+
        if (s.sgp) {
                s.max = s.curr = s.sgp->offset;
-               s.max += s.sgp->length;
-               if (dma)
+
+               if (dma) {
+                       s.max += sg_dma_len(s.sgp);
                        s.dma = sg_dma_address(s.sgp);
-               else
+               } else {
+                       s.max += s.sgp->length;
                        s.pfn = page_to_pfn(sg_page(s.sgp));
+               }
        }

        return s;
Tom Murphy Aug. 27, 2020, 11:34 p.m. UTC | #14
On Thu, 27 Aug 2020 at 22:36, Logan Gunthorpe <logang@deltatee.com> wrote:
>
>
>
> On 2020-08-23 6:04 p.m., Tom Murphy wrote:
> > I have added a check for the sg_dma_len == 0 :
> > """
> >  } __sgt_iter(struct scatterlist *sgl, bool dma) {
> >         struct sgt_iter s = { .sgp = sgl };
> >
> > +       if (sgl && sg_dma_len(sgl) == 0)
> > +           s.sgp = NULL;
> >
> >         if (s.sgp) {
> >             .....
> > """
> > at location [1].
> > but it doens't fix the problem.
>
> Based on my read of the code, it looks like we also need to change usage
> of sgl->length... Something like the rough patch below, maybe?
>
> Also, Tom, do you have an updated version of the patchset to convert the
> Intel IOMMU to dma-iommu available? The last one I've found doesn't
> apply cleanly (I'm assuming parts of it have been merged in slightly
> modified forms).
>

I'll try and post one in the next 24hours

> Thanks,
>
> Logan
>
> --
>
> diff --git a/drivers/gpu/drm/i915/i915_scatterlist.h
> b/drivers/gpu/drm/i915/i915
> index b7b59328cb76..9367ac801f0c 100644
> --- a/drivers/gpu/drm/i915/i915_scatterlist.h
> +++ b/drivers/gpu/drm/i915/i915_scatterlist.h
> @@ -27,13 +27,19 @@ static __always_inline struct sgt_iter {
>  } __sgt_iter(struct scatterlist *sgl, bool dma) {
>         struct sgt_iter s = { .sgp = sgl };
>
> +       if (sgl && !sg_dma_len(s.sgp))
> +               s.sgp = NULL;
> +
>         if (s.sgp) {
>                 s.max = s.curr = s.sgp->offset;
> -               s.max += s.sgp->length;
> -               if (dma)
> +
> +               if (dma) {
> +                       s.max += sg_dma_len(s.sgp);
>                         s.dma = sg_dma_address(s.sgp);
> -               else
> +               } else {
> +                       s.max += s.sgp->length;
>                         s.pfn = page_to_pfn(sg_page(s.sgp));
> +               }
>         }
>
>         return s;
Tom Murphy Sept. 3, 2020, 8:26 p.m. UTC | #15
On Fri, 28 Aug 2020 at 00:34, Tom Murphy <murphyt7@tcd.ie> wrote:
>
> On Thu, 27 Aug 2020 at 22:36, Logan Gunthorpe <logang@deltatee.com> wrote:
> >
> >
> >
> > On 2020-08-23 6:04 p.m., Tom Murphy wrote:
> > > I have added a check for the sg_dma_len == 0 :
> > > """
> > >  } __sgt_iter(struct scatterlist *sgl, bool dma) {
> > >         struct sgt_iter s = { .sgp = sgl };
> > >
> > > +       if (sgl && sg_dma_len(sgl) == 0)
> > > +           s.sgp = NULL;
> > >
> > >         if (s.sgp) {
> > >             .....
> > > """
> > > at location [1].
> > > but it doens't fix the problem.
> >
> > Based on my read of the code, it looks like we also need to change usage
> > of sgl->length... Something like the rough patch below, maybe?
> >
> > Also, Tom, do you have an updated version of the patchset to convert the
> > Intel IOMMU to dma-iommu available? The last one I've found doesn't
> > apply cleanly (I'm assuming parts of it have been merged in slightly
> > modified forms).
> >
>
> I'll try and post one in the next 24hours

I have just posted this now:
The subject of the cover letter is:
"[PATCH V2 0/5] Convert the intel iommu driver to the dma-iommu api"

>
> > Thanks,
> >
> > Logan
> >
> > --
> >
> > diff --git a/drivers/gpu/drm/i915/i915_scatterlist.h
> > b/drivers/gpu/drm/i915/i915
> > index b7b59328cb76..9367ac801f0c 100644
> > --- a/drivers/gpu/drm/i915/i915_scatterlist.h
> > +++ b/drivers/gpu/drm/i915/i915_scatterlist.h
> > @@ -27,13 +27,19 @@ static __always_inline struct sgt_iter {
> >  } __sgt_iter(struct scatterlist *sgl, bool dma) {
> >         struct sgt_iter s = { .sgp = sgl };
> >
> > +       if (sgl && !sg_dma_len(s.sgp))
> > +               s.sgp = NULL;
> > +
> >         if (s.sgp) {
> >                 s.max = s.curr = s.sgp->offset;
> > -               s.max += s.sgp->length;
> > -               if (dma)
> > +
> > +               if (dma) {
> > +                       s.max += sg_dma_len(s.sgp);
> >                         s.dma = sg_dma_address(s.sgp);
> > -               else
> > +               } else {
> > +                       s.max += s.sgp->length;
> >                         s.pfn = page_to_pfn(sg_page(s.sgp));
> > +               }
> >         }
> >
> >         return s;
Tvrtko Ursulin Sept. 8, 2020, 3:28 p.m. UTC | #16
Hi,

On 27/08/2020 22:36, Logan Gunthorpe wrote:
> On 2020-08-23 6:04 p.m., Tom Murphy wrote:
>> I have added a check for the sg_dma_len == 0 :
>> """
>>   } __sgt_iter(struct scatterlist *sgl, bool dma) {
>>          struct sgt_iter s = { .sgp = sgl };
>>
>> +       if (sgl && sg_dma_len(sgl) == 0)
>> +           s.sgp = NULL;
>>
>>          if (s.sgp) {
>>              .....
>> """
>> at location [1].
>> but it doens't fix the problem.
> 
> Based on my read of the code, it looks like we also need to change usage
> of sgl->length... Something like the rough patch below, maybe?

This thread was brought to my attention and I initially missed this 
reply. Essentially I came to the same conclusion about the need to use 
sg_dma_len. One small correction below:

> Also, Tom, do you have an updated version of the patchset to convert the
> Intel IOMMU to dma-iommu available? The last one I've found doesn't
> apply cleanly (I'm assuming parts of it have been merged in slightly
> modified forms).
> 
> Thanks,
> 
> Logan
> 
> --
> 
> diff --git a/drivers/gpu/drm/i915/i915_scatterlist.h
> b/drivers/gpu/drm/i915/i915
> index b7b59328cb76..9367ac801f0c 100644
> --- a/drivers/gpu/drm/i915/i915_scatterlist.h
> +++ b/drivers/gpu/drm/i915/i915_scatterlist.h
> @@ -27,13 +27,19 @@ static __always_inline struct sgt_iter {
>   } __sgt_iter(struct scatterlist *sgl, bool dma) {
>          struct sgt_iter s = { .sgp = sgl };
> 
> +       if (sgl && !sg_dma_len(s.sgp))

I'd extend the condition to be, just to be safe:

	if (dma && sgl && !sg_dma_len(s.sgp))

> +               s.sgp = NULL;
> +
>          if (s.sgp) {
>                  s.max = s.curr = s.sgp->offset;
> -               s.max += s.sgp->length;
> -               if (dma)
> +
> +               if (dma) {
> +                       s.max += sg_dma_len(s.sgp);
>                          s.dma = sg_dma_address(s.sgp);
> -               else
> +               } else {
> +                       s.max += s.sgp->length;
>                          s.pfn = page_to_pfn(sg_page(s.sgp));
> +               }

Otherwise has this been tested or alternatively how to test it? (How to 
repro the issue.)

Regards,

Tvrtko
Logan Gunthorpe Sept. 8, 2020, 3:44 p.m. UTC | #17
On 2020-09-08 9:28 a.m., Tvrtko Ursulin wrote:
>>
>> diff --git a/drivers/gpu/drm/i915/i915_scatterlist.h
>> b/drivers/gpu/drm/i915/i915
>> index b7b59328cb76..9367ac801f0c 100644
>> --- a/drivers/gpu/drm/i915/i915_scatterlist.h
>> +++ b/drivers/gpu/drm/i915/i915_scatterlist.h
>> @@ -27,13 +27,19 @@ static __always_inline struct sgt_iter {
>>   } __sgt_iter(struct scatterlist *sgl, bool dma) {
>>          struct sgt_iter s = { .sgp = sgl };
>>
>> +       if (sgl && !sg_dma_len(s.sgp))
> 
> I'd extend the condition to be, just to be safe:
>     if (dma && sgl && !sg_dma_len(s.sgp))
>

Right, good catch, that's definitely necessary.

>> +               s.sgp = NULL;
>> +
>>          if (s.sgp) {
>>                  s.max = s.curr = s.sgp->offset;
>> -               s.max += s.sgp->length;
>> -               if (dma)
>> +
>> +               if (dma) {
>> +                       s.max += sg_dma_len(s.sgp);
>>                          s.dma = sg_dma_address(s.sgp);
>> -               else
>> +               } else {
>> +                       s.max += s.sgp->length;
>>                          s.pfn = page_to_pfn(sg_page(s.sgp));
>> +               }
> 
> Otherwise has this been tested or alternatively how to test it? (How to
> repro the issue.)

It has not been tested. To test it, you need Tom's patch set without the
last "DO NOT MERGE" patch:

https://lkml.kernel.org/lkml/20200907070035.GA25114@infradead.org/T/

Thanks,

Logan
Tom Murphy Sept. 8, 2020, 10:43 p.m. UTC | #18
On Tue, 8 Sep 2020 at 16:56, Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
>
> On 08/09/2020 16:44, Logan Gunthorpe wrote:
> > On 2020-09-08 9:28 a.m., Tvrtko Ursulin wrote:
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/i915_scatterlist.h
> >>> b/drivers/gpu/drm/i915/i915
> >>> index b7b59328cb76..9367ac801f0c 100644
> >>> --- a/drivers/gpu/drm/i915/i915_scatterlist.h
> >>> +++ b/drivers/gpu/drm/i915/i915_scatterlist.h
> >>> @@ -27,13 +27,19 @@ static __always_inline struct sgt_iter {
> >>>    } __sgt_iter(struct scatterlist *sgl, bool dma) {
> >>>           struct sgt_iter s = { .sgp = sgl };
> >>>
> >>> +       if (sgl && !sg_dma_len(s.sgp))
> >>
> >> I'd extend the condition to be, just to be safe:
> >>      if (dma && sgl && !sg_dma_len(s.sgp))
> >>
> >
> > Right, good catch, that's definitely necessary.
> >
> >>> +               s.sgp = NULL;
> >>> +
> >>>           if (s.sgp) {
> >>>                   s.max = s.curr = s.sgp->offset;
> >>> -               s.max += s.sgp->length;
> >>> -               if (dma)
> >>> +
> >>> +               if (dma) {
> >>> +                       s.max += sg_dma_len(s.sgp);
> >>>                           s.dma = sg_dma_address(s.sgp);
> >>> -               else
> >>> +               } else {
> >>> +                       s.max += s.sgp->length;
> >>>                           s.pfn = page_to_pfn(sg_page(s.sgp));
> >>> +               }
> >>
> >> Otherwise has this been tested or alternatively how to test it? (How to
> >> repro the issue.)
> >
> > It has not been tested. To test it, you need Tom's patch set without the
> > last "DO NOT MERGE" patch:
> >
> > https://lkml.kernel.org/lkml/20200907070035.GA25114@infradead.org/T/
>
> Tom, do you have a branch somewhere I could pull from? (Just being lazy
> about downloading a bunch of messages from the archives.)

I don't unfortunately. I'm working locally with poor internet.

>
> What GPU is in your Lenovo x1 carbon 5th generation and what
> graphical/desktop setup I need to repro?


Is this enough info?:

$ lspci -vnn | grep VGA -A 12
00:02.0 VGA compatible controller [0300]: Intel Corporation HD
Graphics 620 [8086:5916] (rev 02) (prog-if 00 [VGA controller])
    Subsystem: Lenovo ThinkPad X1 Carbon 5th Gen [17aa:224f]
    Flags: bus master, fast devsel, latency 0, IRQ 148
    Memory at eb000000 (64-bit, non-prefetchable) [size=16M]
    Memory at 60000000 (64-bit, prefetchable) [size=256M]
    I/O ports at e000 [size=64]
    [virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
    Capabilities: [40] Vendor Specific Information: Len=0c <?>
    Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
    Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
    Capabilities: [d0] Power Management version 2
    Capabilities: [100] Process Address Space ID (PASID)
    Capabilities: [200] Address Translation Service (ATS)


>
> Regards,
>
> Tvrtko
Tvrtko Ursulin Sept. 9, 2020, 9:16 a.m. UTC | #19
On 08/09/2020 23:43, Tom Murphy wrote:
> On Tue, 8 Sep 2020 at 16:56, Tvrtko Ursulin
> <tvrtko.ursulin@linux.intel.com> wrote:
>>
>>
>> On 08/09/2020 16:44, Logan Gunthorpe wrote:
>>> On 2020-09-08 9:28 a.m., Tvrtko Ursulin wrote:
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/i915_scatterlist.h
>>>>> b/drivers/gpu/drm/i915/i915
>>>>> index b7b59328cb76..9367ac801f0c 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_scatterlist.h
>>>>> +++ b/drivers/gpu/drm/i915/i915_scatterlist.h
>>>>> @@ -27,13 +27,19 @@ static __always_inline struct sgt_iter {
>>>>>     } __sgt_iter(struct scatterlist *sgl, bool dma) {
>>>>>            struct sgt_iter s = { .sgp = sgl };
>>>>>
>>>>> +       if (sgl && !sg_dma_len(s.sgp))
>>>>
>>>> I'd extend the condition to be, just to be safe:
>>>>       if (dma && sgl && !sg_dma_len(s.sgp))
>>>>
>>>
>>> Right, good catch, that's definitely necessary.
>>>
>>>>> +               s.sgp = NULL;
>>>>> +
>>>>>            if (s.sgp) {
>>>>>                    s.max = s.curr = s.sgp->offset;
>>>>> -               s.max += s.sgp->length;
>>>>> -               if (dma)
>>>>> +
>>>>> +               if (dma) {
>>>>> +                       s.max += sg_dma_len(s.sgp);
>>>>>                            s.dma = sg_dma_address(s.sgp);
>>>>> -               else
>>>>> +               } else {
>>>>> +                       s.max += s.sgp->length;
>>>>>                            s.pfn = page_to_pfn(sg_page(s.sgp));
>>>>> +               }
>>>>
>>>> Otherwise has this been tested or alternatively how to test it? (How to
>>>> repro the issue.)
>>>
>>> It has not been tested. To test it, you need Tom's patch set without the
>>> last "DO NOT MERGE" patch:
>>>
>>> https://lkml.kernel.org/lkml/20200907070035.GA25114@infradead.org/T/
>>
>> Tom, do you have a branch somewhere I could pull from? (Just being lazy
>> about downloading a bunch of messages from the archives.)
> 
> I don't unfortunately. I'm working locally with poor internet.
> 
>>
>> What GPU is in your Lenovo x1 carbon 5th generation and what
>> graphical/desktop setup I need to repro?
> 
> 
> Is this enough info?:
> 
> $ lspci -vnn | grep VGA -A 12
> 00:02.0 VGA compatible controller [0300]: Intel Corporation HD
> Graphics 620 [8086:5916] (rev 02) (prog-if 00 [VGA controller])
>      Subsystem: Lenovo ThinkPad X1 Carbon 5th Gen [17aa:224f]
>      Flags: bus master, fast devsel, latency 0, IRQ 148
>      Memory at eb000000 (64-bit, non-prefetchable) [size=16M]
>      Memory at 60000000 (64-bit, prefetchable) [size=256M]
>      I/O ports at e000 [size=64]
>      [virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
>      Capabilities: [40] Vendor Specific Information: Len=0c <?>
>      Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
>      Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
>      Capabilities: [d0] Power Management version 2
>      Capabilities: [100] Process Address Space ID (PASID)
>      Capabilities: [200] Address Translation Service (ATS)

Works for a start. What about the steps to repro? Any desktop 
environment and it is just visual corruption, no hangs/stalls or such?

I've submitted a series consisting of what I understood are the patches 
needed to repro the issue to our automated CI here:

https://patchwork.freedesktop.org/series/81489/

So will see if it will catch something, or more targeted testing will be 
required. Hopefully it does trip over in which case I can add the patch 
suggested by Logan on top and see if that fixes it. Or I'll need to 
write a new test case.

If you could glance over my series to check I identified the patches 
correctly it would be appreciated.

Regards,

Tvrtko
Tom Murphy Sept. 10, 2020, 1:33 p.m. UTC | #20
On Wed, 9 Sep 2020 at 13:56, Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
>
> On 09/09/2020 10:16, Tvrtko Ursulin wrote:
> > On 08/09/2020 23:43, Tom Murphy wrote:
> >> On Tue, 8 Sep 2020 at 16:56, Tvrtko Ursulin
> >> <tvrtko.ursulin@linux.intel.com> wrote:
> >>> On 08/09/2020 16:44, Logan Gunthorpe wrote:
> >>>> On 2020-09-08 9:28 a.m., Tvrtko Ursulin wrote:
> >>>>>>
> >>>>>> diff --git a/drivers/gpu/drm/i915/i915_scatterlist.h
> >>>>>> b/drivers/gpu/drm/i915/i915
> >>>>>> index b7b59328cb76..9367ac801f0c 100644
> >>>>>> --- a/drivers/gpu/drm/i915/i915_scatterlist.h
> >>>>>> +++ b/drivers/gpu/drm/i915/i915_scatterlist.h
> >>>>>> @@ -27,13 +27,19 @@ static __always_inline struct sgt_iter {
> >>>>>>     } __sgt_iter(struct scatterlist *sgl, bool dma) {
> >>>>>>            struct sgt_iter s = { .sgp = sgl };
> >>>>>>
> >>>>>> +       if (sgl && !sg_dma_len(s.sgp))
> >>>>>
> >>>>> I'd extend the condition to be, just to be safe:
> >>>>>       if (dma && sgl && !sg_dma_len(s.sgp))
> >>>>>
> >>>>
> >>>> Right, good catch, that's definitely necessary.
> >>>>
> >>>>>> +               s.sgp = NULL;
> >>>>>> +
> >>>>>>            if (s.sgp) {
> >>>>>>                    s.max = s.curr = s.sgp->offset;
> >>>>>> -               s.max += s.sgp->length;
> >>>>>> -               if (dma)
> >>>>>> +
> >>>>>> +               if (dma) {
> >>>>>> +                       s.max += sg_dma_len(s.sgp);
> >>>>>>                            s.dma = sg_dma_address(s.sgp);
> >>>>>> -               else
> >>>>>> +               } else {
> >>>>>> +                       s.max += s.sgp->length;
> >>>>>>                            s.pfn = page_to_pfn(sg_page(s.sgp));
> >>>>>> +               }
> >>>>>
> >>>>> Otherwise has this been tested or alternatively how to test it?
> >>>>> (How to
> >>>>> repro the issue.)
> >>>>
> >>>> It has not been tested. To test it, you need Tom's patch set without
> >>>> the
> >>>> last "DO NOT MERGE" patch:
> >>>>
> >>>> https://lkml.kernel.org/lkml/20200907070035.GA25114@infradead.org/T/
> >>>
> >>> Tom, do you have a branch somewhere I could pull from? (Just being lazy
> >>> about downloading a bunch of messages from the archives.)
> >>
> >> I don't unfortunately. I'm working locally with poor internet.
> >>
> >>>
> >>> What GPU is in your Lenovo x1 carbon 5th generation and what
> >>> graphical/desktop setup I need to repro?
> >>
> >>
> >> Is this enough info?:
> >>
> >> $ lspci -vnn | grep VGA -A 12
> >> 00:02.0 VGA compatible controller [0300]: Intel Corporation HD
> >> Graphics 620 [8086:5916] (rev 02) (prog-if 00 [VGA controller])
> >>      Subsystem: Lenovo ThinkPad X1 Carbon 5th Gen [17aa:224f]
> >>      Flags: bus master, fast devsel, latency 0, IRQ 148
> >>      Memory at eb000000 (64-bit, non-prefetchable) [size=16M]
> >>      Memory at 60000000 (64-bit, prefetchable) [size=256M]
> >>      I/O ports at e000 [size=64]
> >>      [virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
> >>      Capabilities: [40] Vendor Specific Information: Len=0c <?>
> >>      Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
> >>      Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
> >>      Capabilities: [d0] Power Management version 2
> >>      Capabilities: [100] Process Address Space ID (PASID)
> >>      Capabilities: [200] Address Translation Service (ATS)
> >
> > Works for a start. What about the steps to repro? Any desktop
> > environment and it is just visual corruption, no hangs/stalls or such?
> >
> > I've submitted a series consisting of what I understood are the patches
> > needed to repro the issue to our automated CI here:
> >
> > https://patchwork.freedesktop.org/series/81489/
> >
> > So will see if it will catch something, or more targeted testing will be
> > required. Hopefully it does trip over in which case I can add the patch
> > suggested by Logan on top and see if that fixes it. Or I'll need to
> > write a new test case.
> >
> > If you could glance over my series to check I identified the patches
> > correctly it would be appreciated.
>
> Our CI was more than capable at catching the breakage so I've copied you
> on a patch (https://patchwork.freedesktop.org/series/81497/) which has a
> good potential to fix this. (Or improve the robustness of our sg walks,
> depends how you look at it.)
>
> Would you be able to test it in your environment by any chance? If it
> works I understand it unblocks your IOMMU work, right?

I tested your latest patch set ([PATCH 1/2] drm/i915: Fix DMA mapped
scatterlist walks) and it fixes the issue. great work!

>
> Regards,
>
> Tvrtko
Tom Murphy Sept. 10, 2020, 1:34 p.m. UTC | #21
On Thu, 10 Sep 2020 at 14:33, Tom Murphy <murphyt7@tcd.ie> wrote:
>
> On Wed, 9 Sep 2020 at 13:56, Tvrtko Ursulin
> <tvrtko.ursulin@linux.intel.com> wrote:
> >
> >
> > On 09/09/2020 10:16, Tvrtko Ursulin wrote:
> > > On 08/09/2020 23:43, Tom Murphy wrote:
> > >> On Tue, 8 Sep 2020 at 16:56, Tvrtko Ursulin
> > >> <tvrtko.ursulin@linux.intel.com> wrote:
> > >>> On 08/09/2020 16:44, Logan Gunthorpe wrote:
> > >>>> On 2020-09-08 9:28 a.m., Tvrtko Ursulin wrote:
> > >>>>>>
> > >>>>>> diff --git a/drivers/gpu/drm/i915/i915_scatterlist.h
> > >>>>>> b/drivers/gpu/drm/i915/i915
> > >>>>>> index b7b59328cb76..9367ac801f0c 100644
> > >>>>>> --- a/drivers/gpu/drm/i915/i915_scatterlist.h
> > >>>>>> +++ b/drivers/gpu/drm/i915/i915_scatterlist.h
> > >>>>>> @@ -27,13 +27,19 @@ static __always_inline struct sgt_iter {
> > >>>>>>     } __sgt_iter(struct scatterlist *sgl, bool dma) {
> > >>>>>>            struct sgt_iter s = { .sgp = sgl };
> > >>>>>>
> > >>>>>> +       if (sgl && !sg_dma_len(s.sgp))
> > >>>>>
> > >>>>> I'd extend the condition to be, just to be safe:
> > >>>>>       if (dma && sgl && !sg_dma_len(s.sgp))
> > >>>>>
> > >>>>
> > >>>> Right, good catch, that's definitely necessary.
> > >>>>
> > >>>>>> +               s.sgp = NULL;
> > >>>>>> +
> > >>>>>>            if (s.sgp) {
> > >>>>>>                    s.max = s.curr = s.sgp->offset;
> > >>>>>> -               s.max += s.sgp->length;
> > >>>>>> -               if (dma)
> > >>>>>> +
> > >>>>>> +               if (dma) {
> > >>>>>> +                       s.max += sg_dma_len(s.sgp);
> > >>>>>>                            s.dma = sg_dma_address(s.sgp);
> > >>>>>> -               else
> > >>>>>> +               } else {
> > >>>>>> +                       s.max += s.sgp->length;
> > >>>>>>                            s.pfn = page_to_pfn(sg_page(s.sgp));
> > >>>>>> +               }
> > >>>>>
> > >>>>> Otherwise has this been tested or alternatively how to test it?
> > >>>>> (How to
> > >>>>> repro the issue.)
> > >>>>
> > >>>> It has not been tested. To test it, you need Tom's patch set without
> > >>>> the
> > >>>> last "DO NOT MERGE" patch:
> > >>>>
> > >>>> https://lkml.kernel.org/lkml/20200907070035.GA25114@infradead.org/T/
> > >>>
> > >>> Tom, do you have a branch somewhere I could pull from? (Just being lazy
> > >>> about downloading a bunch of messages from the archives.)
> > >>
> > >> I don't unfortunately. I'm working locally with poor internet.
> > >>
> > >>>
> > >>> What GPU is in your Lenovo x1 carbon 5th generation and what
> > >>> graphical/desktop setup I need to repro?
> > >>
> > >>
> > >> Is this enough info?:
> > >>
> > >> $ lspci -vnn | grep VGA -A 12
> > >> 00:02.0 VGA compatible controller [0300]: Intel Corporation HD
> > >> Graphics 620 [8086:5916] (rev 02) (prog-if 00 [VGA controller])
> > >>      Subsystem: Lenovo ThinkPad X1 Carbon 5th Gen [17aa:224f]
> > >>      Flags: bus master, fast devsel, latency 0, IRQ 148
> > >>      Memory at eb000000 (64-bit, non-prefetchable) [size=16M]
> > >>      Memory at 60000000 (64-bit, prefetchable) [size=256M]
> > >>      I/O ports at e000 [size=64]
> > >>      [virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
> > >>      Capabilities: [40] Vendor Specific Information: Len=0c <?>
> > >>      Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
> > >>      Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
> > >>      Capabilities: [d0] Power Management version 2
> > >>      Capabilities: [100] Process Address Space ID (PASID)
> > >>      Capabilities: [200] Address Translation Service (ATS)
> > >
> > > Works for a start. What about the steps to repro? Any desktop
> > > environment and it is just visual corruption, no hangs/stalls or such?
> > >
> > > I've submitted a series consisting of what I understood are the patches
> > > needed to repro the issue to our automated CI here:
> > >
> > > https://patchwork.freedesktop.org/series/81489/
> > >
> > > So will see if it will catch something, or more targeted testing will be
> > > required. Hopefully it does trip over in which case I can add the patch
> > > suggested by Logan on top and see if that fixes it. Or I'll need to
> > > write a new test case.
> > >
> > > If you could glance over my series to check I identified the patches
> > > correctly it would be appreciated.
> >
> > Our CI was more than capable at catching the breakage so I've copied you
> > on a patch (https://patchwork.freedesktop.org/series/81497/) which has a
> > good potential to fix this. (Or improve the robustness of our sg walks,
> > depends how you look at it.)
> >
> > Would you be able to test it in your environment by any chance? If it
> > works I understand it unblocks your IOMMU work, right?

And yes this does unblock the iommu work

>
> I tested your latest patch set ([PATCH 1/2] drm/i915: Fix DMA mapped
> scatterlist walks) and it fixes the issue. great work!
>
> >
> > Regards,
> >
> > Tvrtko