Message ID | CAPM=9txGww+omvateOTizZRV9_wLdAbq6uAz3DRa_S6bn1jQuQ@mail.gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [git,pull] drm for 5.8-rc1 | expand |
On Mon, Jun 1, 2020 at 11:06 PM Dave Airlie <airlied@gmail.com> wrote: > > This tree is a bit conflicty, the i915 ones are probably the hairy > ones, but amdgpu has a bunch as well, along with smattering of others. Hmm. Some of them are due to your previous mis-merges. Your commit 937eea297e26 ("Merge tag 'amd-drm-next-5.8-2020-04-24' of git://people.freedesktop.org/~agd5f/linux into drm-next") seems to have mis-merged the CONFIG_DEBUG_FS thing in drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c. I'm still working through the rest of the merge, so far that was the only one that made me go "Whaa?". Linus
On Tue, Jun 2, 2020 at 2:21 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Hmm. Some of them are due to your previous mis-merges. > > Your commit 937eea297e26 ("Merge tag 'amd-drm-next-5.8-2020-04-24' of > git://people.freedesktop.org/~agd5f/linux into drm-next") seems to > have mis-merged the CONFIG_DEBUG_FS thing in > drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c. Sorry, wrong filename. That should have been drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c, I cut-and-pasted the wrong path from the conflict list.. Linus
On Tue, Jun 2, 2020 at 2:21 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > I'm still working through the rest of the merge, so far that was the > only one that made me go "Whaa?". Hmm. I'm also ending up effectively reverting the drm commit b28ad7deb2f2 ("drm/tidss: Use simple encoder") because commit 9da67433f64e ("drm/tidss: fix crash related to accessing freed memory") made the premise of that simply encoder commit no longer be true. If there is a better way to sort that out (ie something like "use simple encoder but make it free things at destroy time"), I don't know of it. I'll let you guys fight it out (added people involved with those commits to the participants, Linus
On Mon, Jun 1, 2020 at 11:06 PM Dave Airlie <airlied@gmail.com> wrote: > > I've pushed a merged by me tree here, which I think gets them all > correct, but please let me know if you think different. > https://cgit.freedesktop.org/~airlied/linux/log/?h=drm-5.8-merged Ok, I get the same result, except my resolution to the simple encoder issue was slightly different. I removed the simple helper header include too as part of basically undoing the whole simple encoder conversion. But other than that we're identical, which is a good sign. Apparently the drm mis-merge in the middle got fixed up. Linus
The pull request you sent on Tue, 2 Jun 2020 16:06:32 +1000:
> git://anongit.freedesktop.org/drm/drm tags/drm-next-2020-06-02
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/faa392181a0bd42c5478175cef601adeecdc91b6
Thank you!
On Wed, 3 Jun 2020 at 08:14, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Mon, Jun 1, 2020 at 11:06 PM Dave Airlie <airlied@gmail.com> wrote: > > > > I've pushed a merged by me tree here, which I think gets them all > > correct, but please let me know if you think different. > > https://cgit.freedesktop.org/~airlied/linux/log/?h=drm-5.8-merged > > Ok, I get the same result, except my resolution to the simple encoder > issue was slightly different. I removed the simple helper header > include too as part of basically undoing the whole simple encoder > conversion. Yes sounds like my experience. I spent time on the tides and it was a revert pretty much of the commit in next, I just missed the header include line. I also realised I'd likely mismerged earlier when fixing this up, I'm going to have to put more time into merge fixing up, I'm still not always happy with my methods of figuring out what the correct answer is. > But other than that we're identical, which is a good sign. Apparently > the drm mis-merge in the middle got fixed up. Cool, thanks for redoing it, since this was definitely one of the more conflicty ones I've had in a while. Dave.
Hi Am 02.06.20 um 23:56 schrieb Linus Torvalds: > On Tue, Jun 2, 2020 at 2:21 PM Linus Torvalds > <torvalds@linux-foundation.org> wrote: >> >> I'm still working through the rest of the merge, so far that was the >> only one that made me go "Whaa?". > > Hmm. I'm also ending up effectively reverting the drm commit > b28ad7deb2f2 ("drm/tidss: Use simple encoder") because commit > 9da67433f64e ("drm/tidss: fix crash related to accessing freed > memory") made the premise of that simply encoder commit no longer be > true. That's OK. The simple encoder is just for consolidating these almost-empty encoders at a single place. > If there is a better way to sort that out (ie something like "use > simple encoder but make it free things at destroy time"), I don't know > of it. There's now drmm_kmalloc() to auto-free the memory when DRM releases a device. Best regards Thomas > > I'll let you guys fight it out (added people involved with those > commits to the participants, > > Linus > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel >
On Wed, Jun 3, 2020 at 9:18 AM Thomas Zimmermann <tzimmermann@suse.de> wrote: > > Hi > > Am 02.06.20 um 23:56 schrieb Linus Torvalds: > > On Tue, Jun 2, 2020 at 2:21 PM Linus Torvalds > > <torvalds@linux-foundation.org> wrote: > >> > >> I'm still working through the rest of the merge, so far that was the > >> only one that made me go "Whaa?". > > > > Hmm. I'm also ending up effectively reverting the drm commit > > b28ad7deb2f2 ("drm/tidss: Use simple encoder") because commit > > 9da67433f64e ("drm/tidss: fix crash related to accessing freed > > memory") made the premise of that simply encoder commit no longer be > > true. > > That's OK. The simple encoder is just for consolidating these > almost-empty encoders at a single place. > > > If there is a better way to sort that out (ie something like "use > > simple encoder but make it free things at destroy time"), I don't know > > of it. > > There's now drmm_kmalloc() to auto-free the memory when DRM releases a > device. Yeah I think we discussed that tidss patch on dri-devel when it showed up, right fix is to essentially undo it, replace with a s/devm_kzalloc/drmm_kmalloc/ and then re-apply the simple encoder conversion. We had (and I think still have) some details to sort out in all this, so some back&forth is entirely expected here. Also it's just driver unload, which at least for integrated gpu no user ever cares about, only developers. -Daniel > > Best regards > Thomas > > > > > I'll let you guys fight it out (added people involved with those > > commits to the participants, > > > > Linus > > _______________________________________________ > > dri-devel mailing list > > dri-devel@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > > > > -- > Thomas Zimmermann > Graphics Driver Developer > SUSE Software Solutions Germany GmbH > Maxfeldstr. 5, 90409 Nürnberg, Germany > (HRB 36809, AG Nürnberg) > Geschäftsführer: Felix Imendörffer >
On Tue, Jun 02, 2020 at 04:06:32PM +1000, Dave Airlie wrote: > Hi Linus, > > This is the main drm pull request for 5.8-rc1. > > Highlights: > Core DRM had a lot of refactoring around managed drm resources to make > drivers simpler. > Intel Tigerlake support is on by default > amdgpu now support p2p PCI buffer sharing and encrypted GPU memory Christoph Hellwig basically NAK'd this approach, why is it getting merged all of a sudden?? https://lore.kernel.org/intel-gfx/20200311152838.GA24280@infradead.org/ Are we now OK with this same approach open coded in a driver? This wasn't Cc'd to the usual people doing work in this PCI P2P area?? See commit f44ffd677fb3562ac0a1ff9c8ae52672be741f00 Author: Christian König <christian.koenig@amd.com> Date: Fri Mar 23 16:56:37 2018 +0100 drm/amdgpu: add support for exporting VRAM using DMA-buf v3 We should be able to do this now after checking all the prerequisites. v2: fix entrie count in the sgt v3: manually construct the sg Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Sumit Semwal <sumit.semwal@linaro.org> Link: https://patchwork.freedesktop.org/patch/359295 [..] diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 82a3299e53c042..128a667ed8fa0d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -22,6 +22,7 @@ * Authors: Christian König */ +#include <linux/dma-mapping.h> #include "amdgpu.h" #include "amdgpu_vm.h" #include "amdgpu_atomfirmware.h" @@ -458,6 +459,104 @@ static void amdgpu_vram_mgr_del(struct ttm_mem_type_manager *man, mem->mm_node = NULL; } +/** + * amdgpu_vram_mgr_alloc_sgt - allocate and fill a sg table + * + * @adev: amdgpu device pointer + * @mem: TTM memory object + * @dev: the other device + * @dir: dma direction + * @sgt: resulting sg table + * + * Allocate and fill a sg table from a VRAM allocation. + */ +int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev, + struct ttm_mem_reg *mem, + struct device *dev, + enum dma_data_direction dir, + struct sg_table **sgt) +{ + struct drm_mm_node *node; + struct scatterlist *sg; + int num_entries = 0; + unsigned int pages; + int i, r; + + *sgt = kmalloc(sizeof(*sg), GFP_KERNEL); + if (!*sgt) + return -ENOMEM; + + for (pages = mem->num_pages, node = mem->mm_node; + pages; pages -= node->size, ++node) + ++num_entries; + + r = sg_alloc_table(*sgt, num_entries, GFP_KERNEL); + if (r) + goto error_free; + + for_each_sg((*sgt)->sgl, sg, num_entries, i) + sg->length = 0; + + node = mem->mm_node; + for_each_sg((*sgt)->sgl, sg, num_entries, i) { + phys_addr_t phys = (node->start << PAGE_SHIFT) + + adev->gmc.aper_base; + size_t size = node->size << PAGE_SHIFT; + dma_addr_t addr; + + ++node; + addr = dma_map_resource(dev, phys, size, dir, + DMA_ATTR_SKIP_CPU_SYNC); + r = dma_mapping_error(dev, addr); + if (r) + goto error_unmap; + + sg_set_page(sg, NULL, size, 0); + sg_dma_address(sg) = addr; + sg_dma_len(sg) = size; ^^^^^^^^^^^^^^ Jason
Am 03.06.20 um 22:13 schrieb Jason Gunthorpe: > On Tue, Jun 02, 2020 at 04:06:32PM +1000, Dave Airlie wrote: >> Hi Linus, >> >> This is the main drm pull request for 5.8-rc1. >> >> Highlights: >> Core DRM had a lot of refactoring around managed drm resources to make >> drivers simpler. >> Intel Tigerlake support is on by default >> amdgpu now support p2p PCI buffer sharing and encrypted GPU memory > Christoph Hellwig basically NAK'd this approach, why is it getting > merged all of a sudden?? Dave and Daniel explicitly said they want to have this and it is ok as long as I open code it in the driver and keep it AMD internal. We have that in discussion for years now and constructing/using the sg table is actually only the very minor piece of it. On the other hand there is a lot of work underway to get rid of abusing the sg tables as well. > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fintel-gfx%2F20200311152838.GA24280%40infradead.org%2F&data=02%7C01%7Cchristian.koenig%40amd.com%7C55b238b9104d4a8d4feb08d807faa11c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637268120315706063&sdata=AgVJ45%2Ft%2FVYkyIGIGgMrop69XLQReLDpF0ahL5rjEjo%3D&reserved=0 > > Are we now OK with this same approach open coded in a driver? Intel is apparently doing this as well for years, see the i915 driver internals. > This wasn't Cc'd to the usual people doing work in this PCI P2P area?? I certainly prefer a common framework for this, but when my upstream maintainer says he wants to take this who am I to object? Christian. > > See > > commit f44ffd677fb3562ac0a1ff9c8ae52672be741f00 > Author: Christian König <christian.koenig@amd.com> > Date: Fri Mar 23 16:56:37 2018 +0100 > > drm/amdgpu: add support for exporting VRAM using DMA-buf v3 > > We should be able to do this now after checking all the prerequisites. > > v2: fix entrie count in the sgt > v3: manually construct the sg > > Signed-off-by: Christian König <christian.koenig@amd.com> > Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> > Acked-by: Sumit Semwal <sumit.semwal@linaro.org> > Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fpatch%2F359295&data=02%7C01%7Cchristian.koenig%40amd.com%7C55b238b9104d4a8d4feb08d807faa11c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637268120315706063&sdata=YzNvxBVOf5hcUm5KjOzzV%2FcHG5jdGEYmrI76PQN9v3U%3D&reserved=0 > > [..] > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c > index 82a3299e53c042..128a667ed8fa0d 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c > @@ -22,6 +22,7 @@ > * Authors: Christian König > */ > > +#include <linux/dma-mapping.h> > #include "amdgpu.h" > #include "amdgpu_vm.h" > #include "amdgpu_atomfirmware.h" > @@ -458,6 +459,104 @@ static void amdgpu_vram_mgr_del(struct ttm_mem_type_manager *man, > mem->mm_node = NULL; > } > > +/** > + * amdgpu_vram_mgr_alloc_sgt - allocate and fill a sg table > + * > + * @adev: amdgpu device pointer > + * @mem: TTM memory object > + * @dev: the other device > + * @dir: dma direction > + * @sgt: resulting sg table > + * > + * Allocate and fill a sg table from a VRAM allocation. > + */ > +int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev, > + struct ttm_mem_reg *mem, > + struct device *dev, > + enum dma_data_direction dir, > + struct sg_table **sgt) > +{ > + struct drm_mm_node *node; > + struct scatterlist *sg; > + int num_entries = 0; > + unsigned int pages; > + int i, r; > + > + *sgt = kmalloc(sizeof(*sg), GFP_KERNEL); > + if (!*sgt) > + return -ENOMEM; > + > + for (pages = mem->num_pages, node = mem->mm_node; > + pages; pages -= node->size, ++node) > + ++num_entries; > + > + r = sg_alloc_table(*sgt, num_entries, GFP_KERNEL); > + if (r) > + goto error_free; > + > + for_each_sg((*sgt)->sgl, sg, num_entries, i) > + sg->length = 0; > + > + node = mem->mm_node; > + for_each_sg((*sgt)->sgl, sg, num_entries, i) { > + phys_addr_t phys = (node->start << PAGE_SHIFT) + > + adev->gmc.aper_base; > + size_t size = node->size << PAGE_SHIFT; > + dma_addr_t addr; > + > + ++node; > + addr = dma_map_resource(dev, phys, size, dir, > + DMA_ATTR_SKIP_CPU_SYNC); > + r = dma_mapping_error(dev, addr); > + if (r) > + goto error_unmap; > + > + sg_set_page(sg, NULL, size, 0); > + sg_dma_address(sg) = addr; > + sg_dma_len(sg) = size; > ^^^^^^^^^^^^^^ > > Jason
On Tue, Jun 02, 2020 at 04:06:32PM +1000, Dave Airlie wrote: > James Jones (4): ... > drm/nouveau/kms: Support NVIDIA format modifiers This commit is the first one that breaks Xorg startup for my setup: GTX 1080 + Dell UP2414Q (4K DP MST monitor). I believe this is the crucial part of dmesg (full dmesg is attached): [ 29.997140] [drm:nouveau_framebuffer_new] Unsupported modifier: 0x300000000000014 [ 29.997143] [drm:drm_internal_framebuffer_create] could not create framebuffer [ 29.997145] [drm:drm_ioctl] pid=3393, ret = -22 Any suggestions?
This implies something is trying to use one of the old DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifiers with DRM-KMS without first checking whether it is supported by the kernel. I had tried to force an Xorg+Mesa stack without my userspace patches to hit this error when testing, but must have missed some permutation. If the stalled Mesa patches go in, this would stop happening of course, but those were held up for a long time in review, and are now waiting on me to make some modifications. Are you using the modesetting driver in X? If so, with glamor I presume? What version of Mesa? Any distro patches? Any non-default xorg.conf options that would affect modesetting, your X driver if it isn't modesetting, or glamour? Thanks, -James On 6/30/20 4:08 PM, Kirill A. Shutemov wrote: > On Tue, Jun 02, 2020 at 04:06:32PM +1000, Dave Airlie wrote: >> James Jones (4): > ... >> drm/nouveau/kms: Support NVIDIA format modifiers > > This commit is the first one that breaks Xorg startup for my setup: > GTX 1080 + Dell UP2414Q (4K DP MST monitor). > > I believe this is the crucial part of dmesg (full dmesg is attached): > > [ 29.997140] [drm:nouveau_framebuffer_new] Unsupported modifier: 0x300000000000014 > [ 29.997143] [drm:drm_internal_framebuffer_create] could not create framebuffer > [ 29.997145] [drm:drm_ioctl] pid=3393, ret = -22 > > Any suggestions? >
On Tue, Jun 30, 2020 at 09:40:19PM -0700, James Jones wrote: > This implies something is trying to use one of the old > DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifiers with DRM-KMS without > first checking whether it is supported by the kernel. I had tried to force > an Xorg+Mesa stack without my userspace patches to hit this error when > testing, but must have missed some permutation. If the stalled Mesa patches > go in, this would stop happening of course, but those were held up for a > long time in review, and are now waiting on me to make some modifications. > > Are you using the modesetting driver in X? If so, with glamor I presume? Yes and yes. I attached Xorg.log. > What version of Mesa? 20.0.8 > Any distro patches? I don't see any. It's Gentoo. > Any non-default xorg.conf options that would affect modesetting, your X > driver if it isn't modesetting, or glamour? Modesetting without anything tricky.
On Wed, Jul 01, 2020 at 10:57:19AM +0300, Kirill A. Shutemov wrote: > On Tue, Jun 30, 2020 at 09:40:19PM -0700, James Jones wrote: > > This implies something is trying to use one of the old > > DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifiers with DRM-KMS without > > first checking whether it is supported by the kernel. I had tried to force > > an Xorg+Mesa stack without my userspace patches to hit this error when > > testing, but must have missed some permutation. If the stalled Mesa patches > > go in, this would stop happening of course, but those were held up for a > > long time in review, and are now waiting on me to make some modifications. > > > > Are you using the modesetting driver in X? If so, with glamor I presume? > > Yes and yes. I attached Xorg.log. Attached now.
On Wed, Jul 1, 2020 at 6:45 AM James Jones <jajones@nvidia.com> wrote: > > This implies something is trying to use one of the old > DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifiers with DRM-KMS without > first checking whether it is supported by the kernel. I had tried to > force an Xorg+Mesa stack without my userspace patches to hit this error > when testing, but must have missed some permutation. If the stalled > Mesa patches go in, this would stop happening of course, but those were > held up for a long time in review, and are now waiting on me to make > some modifications. > that's completely irrelevant. If a kernel change breaks userspace, it's a kernel bug. > Are you using the modesetting driver in X? If so, with glamor I > presume? What version of Mesa? Any distro patches? Any non-default > xorg.conf options that would affect modesetting, your X driver if it > isn't modesetting, or glamour? > > Thanks, > -James > > On 6/30/20 4:08 PM, Kirill A. Shutemov wrote: > > On Tue, Jun 02, 2020 at 04:06:32PM +1000, Dave Airlie wrote: > >> James Jones (4): > > ... > >> drm/nouveau/kms: Support NVIDIA format modifiers > > > > This commit is the first one that breaks Xorg startup for my setup: > > GTX 1080 + Dell UP2414Q (4K DP MST monitor). > > > > I believe this is the crucial part of dmesg (full dmesg is attached): > > > > [ 29.997140] [drm:nouveau_framebuffer_new] Unsupported modifier: 0x300000000000014 > > [ 29.997143] [drm:drm_internal_framebuffer_create] could not create framebuffer > > [ 29.997145] [drm:drm_ioctl] pid=3393, ret = -22 > > > > Any suggestions? > > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel >
On 7/1/20 4:24 AM, Karol Herbst wrote: > On Wed, Jul 1, 2020 at 6:45 AM James Jones <jajones@nvidia.com> wrote: >> >> This implies something is trying to use one of the old >> DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifiers with DRM-KMS without >> first checking whether it is supported by the kernel. I had tried to >> force an Xorg+Mesa stack without my userspace patches to hit this error >> when testing, but must have missed some permutation. If the stalled >> Mesa patches go in, this would stop happening of course, but those were >> held up for a long time in review, and are now waiting on me to make >> some modifications. >> > > that's completely irrelevant. If a kernel change breaks userspace, > it's a kernel bug. Agreed it is unacceptable to break userspace, but I don't think it's irrelevant. Perhaps the musings on pending userspace patches are. My intent here was to point out it appears at first glance that something isn't behaving as expected in userspace, so fixing this would likely require some sort of work-around for broken userspace rather than straight-forward fixing of a bug in the kernel logic. My intent was not to shift blame to something besides my code & testing for the regression, though I certainly see how it could be interpreted that way. Regardless, I'm looking in to it. Thanks, -James >> Are you using the modesetting driver in X? If so, with glamor I >> presume? What version of Mesa? Any distro patches? Any non-default >> xorg.conf options that would affect modesetting, your X driver if it >> isn't modesetting, or glamour? >> >> Thanks, >> -James >> >> On 6/30/20 4:08 PM, Kirill A. Shutemov wrote: >>> On Tue, Jun 02, 2020 at 04:06:32PM +1000, Dave Airlie wrote: >>>> James Jones (4): >>> ... >>>> drm/nouveau/kms: Support NVIDIA format modifiers >>> >>> This commit is the first one that breaks Xorg startup for my setup: >>> GTX 1080 + Dell UP2414Q (4K DP MST monitor). >>> >>> I believe this is the crucial part of dmesg (full dmesg is attached): >>> >>> [ 29.997140] [drm:nouveau_framebuffer_new] Unsupported modifier: 0x300000000000014 >>> [ 29.997143] [drm:drm_internal_framebuffer_create] could not create framebuffer >>> [ 29.997145] [drm:drm_ioctl] pid=3393, ret = -22 >>> >>> Any suggestions? >>> >> _______________________________________________ >> dri-devel mailing list >> dri-devel@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/dri-devel >> > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel >
On Wed, Jul 1, 2020 at 5:51 PM James Jones <jajones@nvidia.com> wrote: > > On 7/1/20 4:24 AM, Karol Herbst wrote: > > On Wed, Jul 1, 2020 at 6:45 AM James Jones <jajones@nvidia.com> wrote: > >> > >> This implies something is trying to use one of the old > >> DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifiers with DRM-KMS without > >> first checking whether it is supported by the kernel. I had tried to > >> force an Xorg+Mesa stack without my userspace patches to hit this error > >> when testing, but must have missed some permutation. If the stalled > >> Mesa patches go in, this would stop happening of course, but those were > >> held up for a long time in review, and are now waiting on me to make > >> some modifications. > >> > > > > that's completely irrelevant. If a kernel change breaks userspace, > > it's a kernel bug. > > Agreed it is unacceptable to break userspace, but I don't think it's > irrelevant. Perhaps the musings on pending userspace patches are. > > My intent here was to point out it appears at first glance that > something isn't behaving as expected in userspace, so fixing this would > likely require some sort of work-around for broken userspace rather than > straight-forward fixing of a bug in the kernel logic. My intent was not > to shift blame to something besides my code & testing for the > regression, though I certainly see how it could be interpreted that way. > > Regardless, I'm looking in to it. If we do need to have a kernel workaround I'm happy to help out, I've done a bunch of these and occasionally it's good to get rather creative :-) Ideally we'd also push a minimal fix in userspace to all stable branches and make sure distros upgrade (might need releases if some distro is stuck on old horrors), so that we don't have to keep the hack in place for 10+ years or so. Definitely if the hack amounts to disabling modifiers on nouveau, that would be kinda sad. -Daniel > > Thanks, > -James > > >> Are you using the modesetting driver in X? If so, with glamor I > >> presume? What version of Mesa? Any distro patches? Any non-default > >> xorg.conf options that would affect modesetting, your X driver if it > >> isn't modesetting, or glamour? > >> > >> Thanks, > >> -James > >> > >> On 6/30/20 4:08 PM, Kirill A. Shutemov wrote: > >>> On Tue, Jun 02, 2020 at 04:06:32PM +1000, Dave Airlie wrote: > >>>> James Jones (4): > >>> ... > >>>> drm/nouveau/kms: Support NVIDIA format modifiers > >>> > >>> This commit is the first one that breaks Xorg startup for my setup: > >>> GTX 1080 + Dell UP2414Q (4K DP MST monitor). > >>> > >>> I believe this is the crucial part of dmesg (full dmesg is attached): > >>> > >>> [ 29.997140] [drm:nouveau_framebuffer_new] Unsupported modifier: 0x300000000000014 > >>> [ 29.997143] [drm:drm_internal_framebuffer_create] could not create framebuffer > >>> [ 29.997145] [drm:drm_ioctl] pid=3393, ret = -22 > >>> > >>> Any suggestions? > >>> > >> _______________________________________________ > >> dri-devel mailing list > >> dri-devel@lists.freedesktop.org > >> https://lists.freedesktop.org/mailman/listinfo/dri-devel > >> > > > > _______________________________________________ > > dri-devel mailing list > > dri-devel@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
On Wed, Jul 1, 2020 at 6:01 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > On Wed, Jul 1, 2020 at 5:51 PM James Jones <jajones@nvidia.com> wrote: > > > > On 7/1/20 4:24 AM, Karol Herbst wrote: > > > On Wed, Jul 1, 2020 at 6:45 AM James Jones <jajones@nvidia.com> wrote: > > >> > > >> This implies something is trying to use one of the old > > >> DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifiers with DRM-KMS without > > >> first checking whether it is supported by the kernel. I had tried to > > >> force an Xorg+Mesa stack without my userspace patches to hit this error > > >> when testing, but must have missed some permutation. If the stalled > > >> Mesa patches go in, this would stop happening of course, but those were > > >> held up for a long time in review, and are now waiting on me to make > > >> some modifications. > > >> > > > > > > that's completely irrelevant. If a kernel change breaks userspace, > > > it's a kernel bug. > > > > Agreed it is unacceptable to break userspace, but I don't think it's > > irrelevant. Perhaps the musings on pending userspace patches are. > > > > My intent here was to point out it appears at first glance that > > something isn't behaving as expected in userspace, so fixing this would > > likely require some sort of work-around for broken userspace rather than > > straight-forward fixing of a bug in the kernel logic. My intent was not > > to shift blame to something besides my code & testing for the > > regression, though I certainly see how it could be interpreted that way. > > > > Regardless, I'm looking in to it. > I assume the MR you were talking about is https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3724 ? I am also aware of the tegra driver being broken on my jetson nano and I am now curious if this MR could fix this bug as well... and sorry for the harsh reply, I was just a annoyed by the fact that "everything modifier related is just breaking things", first tegra and that nobody is looking into fixing it and then apparently the userspace code being quite broken as well :/ Anyway, yeah I trust you guys on figuring out the keeping "broken" userspace happy from a kernel side and maybe I can help out with reviewing the mesa bits. I am just wondering if it could help with the tegra situation giving me more reasons to look into it as this would solve other issues I should be working on :) > If we do need to have a kernel workaround I'm happy to help out, I've > done a bunch of these and occasionally it's good to get rather > creative :-) > > Ideally we'd also push a minimal fix in userspace to all stable > branches and make sure distros upgrade (might need releases if some > distro is stuck on old horrors), so that we don't have to keep the > hack in place for 10+ years or so. Definitely if the hack amounts to > disabling modifiers on nouveau, that would be kinda sad. > -Daniel > > > > > Thanks, > > -James > > > > >> Are you using the modesetting driver in X? If so, with glamor I > > >> presume? What version of Mesa? Any distro patches? Any non-default > > >> xorg.conf options that would affect modesetting, your X driver if it > > >> isn't modesetting, or glamour? > > >> > > >> Thanks, > > >> -James > > >> > > >> On 6/30/20 4:08 PM, Kirill A. Shutemov wrote: > > >>> On Tue, Jun 02, 2020 at 04:06:32PM +1000, Dave Airlie wrote: > > >>>> James Jones (4): > > >>> ... > > >>>> drm/nouveau/kms: Support NVIDIA format modifiers > > >>> > > >>> This commit is the first one that breaks Xorg startup for my setup: > > >>> GTX 1080 + Dell UP2414Q (4K DP MST monitor). > > >>> > > >>> I believe this is the crucial part of dmesg (full dmesg is attached): > > >>> > > >>> [ 29.997140] [drm:nouveau_framebuffer_new] Unsupported modifier: 0x300000000000014 > > >>> [ 29.997143] [drm:drm_internal_framebuffer_create] could not create framebuffer > > >>> [ 29.997145] [drm:drm_ioctl] pid=3393, ret = -22 > > >>> > > >>> Any suggestions? > > >>> > > >> _______________________________________________ > > >> dri-devel mailing list > > >> dri-devel@lists.freedesktop.org > > >> https://lists.freedesktop.org/mailman/listinfo/dri-devel > > >> > > > > > > _______________________________________________ > > > dri-devel mailing list > > > dri-devel@lists.freedesktop.org > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > > > > > > > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch >
On 7/1/20 10:04 AM, Karol Herbst wrote: > On Wed, Jul 1, 2020 at 6:01 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: >> >> On Wed, Jul 1, 2020 at 5:51 PM James Jones <jajones@nvidia.com> wrote: >>> >>> On 7/1/20 4:24 AM, Karol Herbst wrote: >>>> On Wed, Jul 1, 2020 at 6:45 AM James Jones <jajones@nvidia.com> wrote: >>>>> >>>>> This implies something is trying to use one of the old >>>>> DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifiers with DRM-KMS without >>>>> first checking whether it is supported by the kernel. I had tried to >>>>> force an Xorg+Mesa stack without my userspace patches to hit this error >>>>> when testing, but must have missed some permutation. If the stalled >>>>> Mesa patches go in, this would stop happening of course, but those were >>>>> held up for a long time in review, and are now waiting on me to make >>>>> some modifications. >>>>> >>>> >>>> that's completely irrelevant. If a kernel change breaks userspace, >>>> it's a kernel bug. >>> >>> Agreed it is unacceptable to break userspace, but I don't think it's >>> irrelevant. Perhaps the musings on pending userspace patches are. >>> >>> My intent here was to point out it appears at first glance that >>> something isn't behaving as expected in userspace, so fixing this would >>> likely require some sort of work-around for broken userspace rather than >>> straight-forward fixing of a bug in the kernel logic. My intent was not >>> to shift blame to something besides my code & testing for the >>> regression, though I certainly see how it could be interpreted that way. >>> >>> Regardless, I'm looking in to it. >> > > I assume the MR you were talking about is > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3724 ? Correct. > I am > also aware of the tegra driver being broken on my jetson nano and I am > now curious if this MR could fix this bug as well... and sorry for the > harsh reply, I was just a annoyed by the fact that "everything > modifier related is just breaking things", first tegra and that nobody > is looking into fixing it and then apparently the userspace code being > quite broken as well :/ > > Anyway, yeah I trust you guys on figuring out the keeping "broken" > userspace happy from a kernel side and maybe I can help out with > reviewing the mesa bits. I am just wondering if it could help with the > tegra situation giving me more reasons to look into it as this would > solve other issues I should be working on :) Not sure if you're claiming this, but if there's Tegra breakage attributable to this patch series, I'd love to hear more details there as well. The Tegra patches did have backwards-compat code to handle the old modifiers, since Tegra was the only working use case I could find for them within the kernel itself. However, the Tegra kernel patches are independent (and haven't even been reviewed yet to my knowledge), so Tegra shouldn't be affected at all given it uses TegraDRM rather than Nouveau's modesetting driver. If there are just general existing issues with modifier support on Tegra, let's take that to a smaller venue. I probably won't be as much help there, but I can at least try to help get some eyes on it. Thanks, -James >> If we do need to have a kernel workaround I'm happy to help out, I've >> done a bunch of these and occasionally it's good to get rather >> creative :-) >> >> Ideally we'd also push a minimal fix in userspace to all stable >> branches and make sure distros upgrade (might need releases if some >> distro is stuck on old horrors), so that we don't have to keep the >> hack in place for 10+ years or so. Definitely if the hack amounts to >> disabling modifiers on nouveau, that would be kinda sad. >> -Daniel >> >>> >>> Thanks, >>> -James >>> >>>>> Are you using the modesetting driver in X? If so, with glamor I >>>>> presume? What version of Mesa? Any distro patches? Any non-default >>>>> xorg.conf options that would affect modesetting, your X driver if it >>>>> isn't modesetting, or glamour? >>>>> >>>>> Thanks, >>>>> -James >>>>> >>>>> On 6/30/20 4:08 PM, Kirill A. Shutemov wrote: >>>>>> On Tue, Jun 02, 2020 at 04:06:32PM +1000, Dave Airlie wrote: >>>>>>> James Jones (4): >>>>>> ... >>>>>>> drm/nouveau/kms: Support NVIDIA format modifiers >>>>>> >>>>>> This commit is the first one that breaks Xorg startup for my setup: >>>>>> GTX 1080 + Dell UP2414Q (4K DP MST monitor). >>>>>> >>>>>> I believe this is the crucial part of dmesg (full dmesg is attached): >>>>>> >>>>>> [ 29.997140] [drm:nouveau_framebuffer_new] Unsupported modifier: 0x300000000000014 >>>>>> [ 29.997143] [drm:drm_internal_framebuffer_create] could not create framebuffer >>>>>> [ 29.997145] [drm:drm_ioctl] pid=3393, ret = -22 >>>>>> >>>>>> Any suggestions? >>>>>> >>>>> _______________________________________________ >>>>> dri-devel mailing list >>>>> dri-devel@lists.freedesktop.org >>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >>>>> >>>> >>>> _______________________________________________ >>>> dri-devel mailing list >>>> dri-devel@lists.freedesktop.org >>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >>>> >> >> >> >> -- >> Daniel Vetter >> Software Engineer, Intel Corporation >> http://blog.ffwll.ch >> > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel >
On Wed, Jul 1, 2020 at 7:37 PM James Jones <jajones@nvidia.com> wrote: > > On 7/1/20 10:04 AM, Karol Herbst wrote: > > On Wed, Jul 1, 2020 at 6:01 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > >> > >> On Wed, Jul 1, 2020 at 5:51 PM James Jones <jajones@nvidia.com> wrote: > >>> > >>> On 7/1/20 4:24 AM, Karol Herbst wrote: > >>>> On Wed, Jul 1, 2020 at 6:45 AM James Jones <jajones@nvidia.com> wrote: > >>>>> > >>>>> This implies something is trying to use one of the old > >>>>> DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifiers with DRM-KMS without > >>>>> first checking whether it is supported by the kernel. I had tried to > >>>>> force an Xorg+Mesa stack without my userspace patches to hit this error > >>>>> when testing, but must have missed some permutation. If the stalled > >>>>> Mesa patches go in, this would stop happening of course, but those were > >>>>> held up for a long time in review, and are now waiting on me to make > >>>>> some modifications. > >>>>> > >>>> > >>>> that's completely irrelevant. If a kernel change breaks userspace, > >>>> it's a kernel bug. > >>> > >>> Agreed it is unacceptable to break userspace, but I don't think it's > >>> irrelevant. Perhaps the musings on pending userspace patches are. > >>> > >>> My intent here was to point out it appears at first glance that > >>> something isn't behaving as expected in userspace, so fixing this would > >>> likely require some sort of work-around for broken userspace rather than > >>> straight-forward fixing of a bug in the kernel logic. My intent was not > >>> to shift blame to something besides my code & testing for the > >>> regression, though I certainly see how it could be interpreted that way. > >>> > >>> Regardless, I'm looking in to it. > >> > > > > I assume the MR you were talking about is > > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3724 ? > > Correct. > > > I am > > also aware of the tegra driver being broken on my jetson nano and I am > > now curious if this MR could fix this bug as well... and sorry for the > > harsh reply, I was just a annoyed by the fact that "everything > > modifier related is just breaking things", first tegra and that nobody > > is looking into fixing it and then apparently the userspace code being > > quite broken as well :/ > > > > Anyway, yeah I trust you guys on figuring out the keeping "broken" > > userspace happy from a kernel side and maybe I can help out with > > reviewing the mesa bits. I am just wondering if it could help with the > > tegra situation giving me more reasons to look into it as this would > > solve other issues I should be working on :) > > Not sure if you're claiming this, but if there's Tegra breakage > attributable to this patch series, I'd love to hear more details there > as well. The Tegra patches did have backwards-compat code to handle the > old modifiers, since Tegra was the only working use case I could find > for them within the kernel itself. However, the Tegra kernel patches > are independent (and haven't even been reviewed yet to my knowledge), so > Tegra shouldn't be affected at all given it uses TegraDRM rather than > Nouveau's modesetting driver. > > If there are just general existing issues with modifier support on > Tegra, let's take that to a smaller venue. I probably won't be as much > help there, but I can at least try to help get some eyes on it. > I am sure that your patches here have nothing to do with it, just inside mesa since https://gitlab.freedesktop.org/mesa/mesa/commit/c56fe4118a2dd991ff1b2a532c0f234eddd435e9 it's broken on the jetson nano and because it's so old I am not able to tell if this is because of some kernel changes or because of the modifier code inside mesa being slightly broken. Maybe you have an idea, but Thierry knows about the issue, but I think he never was able to reproduce it on any system. > Thanks, > -James > > >> If we do need to have a kernel workaround I'm happy to help out, I've > >> done a bunch of these and occasionally it's good to get rather > >> creative :-) > >> > >> Ideally we'd also push a minimal fix in userspace to all stable > >> branches and make sure distros upgrade (might need releases if some > >> distro is stuck on old horrors), so that we don't have to keep the > >> hack in place for 10+ years or so. Definitely if the hack amounts to > >> disabling modifiers on nouveau, that would be kinda sad. > >> -Daniel > >> > >>> > >>> Thanks, > >>> -James > >>> > >>>>> Are you using the modesetting driver in X? If so, with glamor I > >>>>> presume? What version of Mesa? Any distro patches? Any non-default > >>>>> xorg.conf options that would affect modesetting, your X driver if it > >>>>> isn't modesetting, or glamour? > >>>>> > >>>>> Thanks, > >>>>> -James > >>>>> > >>>>> On 6/30/20 4:08 PM, Kirill A. Shutemov wrote: > >>>>>> On Tue, Jun 02, 2020 at 04:06:32PM +1000, Dave Airlie wrote: > >>>>>>> James Jones (4): > >>>>>> ... > >>>>>>> drm/nouveau/kms: Support NVIDIA format modifiers > >>>>>> > >>>>>> This commit is the first one that breaks Xorg startup for my setup: > >>>>>> GTX 1080 + Dell UP2414Q (4K DP MST monitor). > >>>>>> > >>>>>> I believe this is the crucial part of dmesg (full dmesg is attached): > >>>>>> > >>>>>> [ 29.997140] [drm:nouveau_framebuffer_new] Unsupported modifier: 0x300000000000014 > >>>>>> [ 29.997143] [drm:drm_internal_framebuffer_create] could not create framebuffer > >>>>>> [ 29.997145] [drm:drm_ioctl] pid=3393, ret = -22 > >>>>>> > >>>>>> Any suggestions? > >>>>>> > >>>>> _______________________________________________ > >>>>> dri-devel mailing list > >>>>> dri-devel@lists.freedesktop.org > >>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel > >>>>> > >>>> > >>>> _______________________________________________ > >>>> dri-devel mailing list > >>>> dri-devel@lists.freedesktop.org > >>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel > >>>> > >> > >> > >> > >> -- > >> Daniel Vetter > >> Software Engineer, Intel Corporation > >> http://blog.ffwll.ch > >> > > > > _______________________________________________ > > dri-devel mailing list > > dri-devel@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > > >
OK, I think I see what's going on. In the Xorg modesetting driver, the logic is basically: if (gbm_has_modifiers && DRM_CAP_ADDFB2_MODIFIERS != 0) { drmModeAddFB2WithModifiers(..., gbm_bo_get_modifier(bo->gbm)); } else { drmModeAddFB(...); } There's no attempt to verify the DRM-KMS device supports the modifier, but then, why would there be? GBM presumably chose a supported modifier at buffer creation time, and we don't know which plane the FB is going to be used with yet. GBM doesn't actually ask the kernel which modifiers it supports here either though. It just goes into Mesa via DRI and reports the modifier (unpatched) Mesa chose on its own. Mesa just hard-codes the modifiers in its driver backends since its thinking in terms of a device's 3D engine, not display. In theory, Mesa's DRI drivers could query KMS for supported modifiers if allocating from GBM using the non-modifiers path and the SCANOUT flag is set (perhaps some drivers do this or its equivalent? Haven't checked.), but that seems pretty gnarly and doesn't fix the modifier-based GBM allocation path AFAIK. Bit of a mess. For a quick userspace fix that could probably be pushed out everywhere (Only affects Xorg server 1.20+ AFAIK), just retrying drmModeAddFB2WithModifiers() without the DRM_MODE_FB_MODIFIERS flag on failure should be sufficient. Still need to verify as I'm having trouble wrangling my Xorg build at the moment and I'm pressed for time. A more complete fix would be quite involved, as modesetting isn't really properly plumbed to validate GBM's modifiers against KMS planes, and it doesn't seem like GBM/Mesa/DRI should be responsible for this as noted above given the general modifier workflow/design. Most importantly, options I've considered for fixing from the kernel side: -Accept "legacy" modifiers in nouveau in addition to the new modifiers, though avoid reporting them to userspace as supported to avoid further proliferation. This is pretty straightforward. I'll need to modify both the AddFB2 handler (nouveau_validate_decode_mod) and the mode set plane validation logic (nv50_plane_format_mod_supported), but it should end up just being a few lines of code. -Don't validate modifiers in AddFB. This doesn't really gain anything because it just pushes the failure down to mode set time, so it's not that useful, so I don't plan on pursuing this. As noted, need to run just now, but I should have a kernel patch to test out either tonight or tomorrow. If anyone's curious, the reason my testing missed this was I did most of my verification of "old" code against the Xorg 1.19 build included with my distro. I did hack up a Xorg 1.20-ish build to test as well that would have included this path, but I must not have properly configured it with GBM modifier support somehow. I was pretty focused on just testing the forcibly-disabled atomic path in the modesetting driver in this build, so I didn't look too closely at things beyond that. Thanks, -James On 7/1/20 12:59 AM, Kirill A. Shutemov wrote: > On Wed, Jul 01, 2020 at 10:57:19AM +0300, Kirill A. Shutemov wrote: >> On Tue, Jun 30, 2020 at 09:40:19PM -0700, James Jones wrote: >>> This implies something is trying to use one of the old >>> DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifiers with DRM-KMS without >>> first checking whether it is supported by the kernel. I had tried to force >>> an Xorg+Mesa stack without my userspace patches to hit this error when >>> testing, but must have missed some permutation. If the stalled Mesa patches >>> go in, this would stop happening of course, but those were held up for a >>> long time in review, and are now waiting on me to make some modifications. >>> >>> Are you using the modesetting driver in X? If so, with glamor I presume? >> >> Yes and yes. I attached Xorg.log. > > Attached now. > > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel >
On Wed, Jul 1, 2020 at 9:45 PM James Jones <jajones@nvidia.com> wrote: > > OK, I think I see what's going on. In the Xorg modesetting driver, the > logic is basically: > > if (gbm_has_modifiers && DRM_CAP_ADDFB2_MODIFIERS != 0) { > drmModeAddFB2WithModifiers(..., gbm_bo_get_modifier(bo->gbm)); > } else { > drmModeAddFB(...); > } > > There's no attempt to verify the DRM-KMS device supports the modifier, > but then, why would there be? GBM presumably chose a supported modifier > at buffer creation time, and we don't know which plane the FB is going > to be used with yet. GBM doesn't actually ask the kernel which > modifiers it supports here either though. It just goes into Mesa via > DRI and reports the modifier (unpatched) Mesa chose on its own. Mesa > just hard-codes the modifiers in its driver backends since its thinking > in terms of a device's 3D engine, not display. In theory, Mesa's DRI > drivers could query KMS for supported modifiers if allocating from GBM > using the non-modifiers path and the SCANOUT flag is set (perhaps some > drivers do this or its equivalent? Haven't checked.), but that seems > pretty gnarly and doesn't fix the modifier-based GBM allocation path > AFAIK. Bit of a mess. > > For a quick userspace fix that could probably be pushed out everywhere > (Only affects Xorg server 1.20+ AFAIK), just retrying > drmModeAddFB2WithModifiers() without the DRM_MODE_FB_MODIFIERS flag on > failure should be sufficient. Still need to verify as I'm having > trouble wrangling my Xorg build at the moment and I'm pressed for time. > A more complete fix would be quite involved, as modesetting isn't really > properly plumbed to validate GBM's modifiers against KMS planes, and it > doesn't seem like GBM/Mesa/DRI should be responsible for this as noted > above given the general modifier workflow/design. > > Most importantly, options I've considered for fixing from the kernel side: > > -Accept "legacy" modifiers in nouveau in addition to the new modifiers, > though avoid reporting them to userspace as supported to avoid further > proliferation. This is pretty straightforward. I'll need to modify > both the AddFB2 handler (nouveau_validate_decode_mod) and the mode set > plane validation logic (nv50_plane_format_mod_supported), but it should > end up just being a few lines of code. > > -Don't validate modifiers in AddFB. This doesn't really gain anything > because it just pushes the failure down to mode set time, so it's not > that useful, so I don't plan on pursuing this. > > As noted, need to run just now, but I should have a kernel patch to test > out either tonight or tomorrow. > > If anyone's curious, the reason my testing missed this was I did most of > my verification of "old" code against the Xorg 1.19 build included with > my distro. I did hack up a Xorg 1.20-ish build to test as well that > would have included this path, but I must not have properly configured > it with GBM modifier support somehow. I was pretty focused on just > testing the forcibly-disabled atomic path in the modesetting driver in > this build, so I didn't look too closely at things beyond that. Yeah, so modifier support in -modesetting is totally broken. It should be disabled by default because the stuff just doesn't really work. If that doesn't help then I guess we need to do the same thing as what we've done for atomic already in: commit 26b1d3b527e7bf3e24b814d617866ac5199ce68d Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Thu Sep 5 20:53:18 2019 +0200 drm/atomic: Take the atomic toys away from X For added insult for the atomic stuff: Xorg master branch is actually fixed, but no one has done a release of that in over 2 years, so the fixes never got anywhere. I have little hope the Xorg will get back into shape, seems abandoned unfortunately. -Daniel > > Thanks, > -James > > On 7/1/20 12:59 AM, Kirill A. Shutemov wrote: > > On Wed, Jul 01, 2020 at 10:57:19AM +0300, Kirill A. Shutemov wrote: > >> On Tue, Jun 30, 2020 at 09:40:19PM -0700, James Jones wrote: > >>> This implies something is trying to use one of the old > >>> DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifiers with DRM-KMS without > >>> first checking whether it is supported by the kernel. I had tried to force > >>> an Xorg+Mesa stack without my userspace patches to hit this error when > >>> testing, but must have missed some permutation. If the stalled Mesa patches > >>> go in, this would stop happening of course, but those were held up for a > >>> long time in review, and are now waiting on me to make some modifications. > >>> > >>> Are you using the modesetting driver in X? If so, with glamor I presume? > >> > >> Yes and yes. I attached Xorg.log. > > > > Attached now. > > > > > > _______________________________________________ > > dri-devel mailing list > > dri-devel@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > >
On Wed, 1 Jul 2020 12:45:48 -0700 James Jones <jajones@nvidia.com> wrote: > OK, I think I see what's going on. In the Xorg modesetting driver, the > logic is basically: > > if (gbm_has_modifiers && DRM_CAP_ADDFB2_MODIFIERS != 0) { > drmModeAddFB2WithModifiers(..., gbm_bo_get_modifier(bo->gbm)); > } else { > drmModeAddFB(...); > } > > There's no attempt to verify the DRM-KMS device supports the modifier, > but then, why would there be? GBM presumably chose a supported modifier > at buffer creation time, and we don't know which plane the FB is going > to be used with yet. GBM doesn't actually ask the kernel which > modifiers it supports here either though. It just goes into Mesa via > DRI and reports the modifier (unpatched) Mesa chose on its own. Mesa > just hard-codes the modifiers in its driver backends since its thinking > in terms of a device's 3D engine, not display. In theory, Mesa's DRI > drivers could query KMS for supported modifiers if allocating from GBM > using the non-modifiers path and the SCANOUT flag is set (perhaps some > drivers do this or its equivalent? Haven't checked.), but that seems > pretty gnarly and doesn't fix the modifier-based GBM allocation path > AFAIK. Bit of a mess. Hi, the way I believe it is supposed to work is that modesetting DDX first asks KMS what modifiers it supports, and then passes that list to GBM when it is attempting to create a gbm_bo or a gbm_surface. If modesetting does not do that and still uses modifiers API for creating bos or surfaces, modesetting is broken I believe. gbm_bo_create_with_modifiers() gbm_surface_create_with_modifiers() Of course, this doesn't affect the need of a kernel workaround. How is modesetting creating the bo or surface currently? Why would the KMS driver starting to support modifiers API suddenly make the buffers fail, does it change how the buffers get allocated in modesetting as well? Or maybe you're talking about something else than what I assumed, I'm not really up-to-date here. Thanks, pq
Hi, On Wed, 1 Jul 2020 at 20:45, James Jones <jajones@nvidia.com> wrote: > OK, I think I see what's going on. In the Xorg modesetting driver, the > logic is basically: > > if (gbm_has_modifiers && DRM_CAP_ADDFB2_MODIFIERS != 0) { > drmModeAddFB2WithModifiers(..., gbm_bo_get_modifier(bo->gbm)); > } else { > drmModeAddFB(...); > } I read this thread expecting to explain the correct behaviour we implement in Weston and how modesetting needs to be fixed, but ... that seems OK to me? As long as `gbm_has_modifiers` is a proxy for 'we used gbm_(bo|surface)_create_with_modifiers to allocate the buffer'. > There's no attempt to verify the DRM-KMS device supports the modifier, > but then, why would there be? GBM presumably chose a supported modifier > at buffer creation time, and we don't know which plane the FB is going > to be used with yet. GBM doesn't actually ask the kernel which > modifiers it supports here either though. Right, it doesn't ask, because userspace tells it which modifiers to use. The correct behaviour is to take the list from the KMS `IN_FORMATS` property and then pass that to `gbm_(bo|surface)_create_with_modifiers`; GBM must then select from that list and only that list. If that call does not succeed and Xorg falls back to `gbm_surface_create`, then it must not call `gbm_bo_get_modifier` - so that would be a modesetting bug. If that call does succeed and `gbm_bo_get_modifier` subsequently reports a modifier which was not in the list, that's a Mesa driver bug. > It just goes into Mesa via > DRI and reports the modifier (unpatched) Mesa chose on its own. Mesa > just hard-codes the modifiers in its driver backends since its thinking > in terms of a device's 3D engine, not display. In theory, Mesa's DRI > drivers could query KMS for supported modifiers if allocating from GBM > using the non-modifiers path and the SCANOUT flag is set (perhaps some > drivers do this or its equivalent? Haven't checked.), but that seems > pretty gnarly and doesn't fix the modifier-based GBM allocation path > AFAIK. Bit of a mess. Two options for GBM users: * call gbm_*_create_with_modifiers, it succeeds, call gbm_bo_get_modifier, pass modifier into AddFB * call gbm_*_create (without modifiers), it succeeds, do not call gbm_bo_get_modifier, do not pass a modifier into AddFB Anything else is a bug in the user. Note that falling back from 1 to 2 is fine: if `gbm_*_create_with_modifiers()` fails, you can fall back to the non-modifier path, provided you don't later try to get a modifier back out. > For a quick userspace fix that could probably be pushed out everywhere > (Only affects Xorg server 1.20+ AFAIK), just retrying > drmModeAddFB2WithModifiers() without the DRM_MODE_FB_MODIFIERS flag on > failure should be sufficient. This would break other drivers. > Still need to verify as I'm having > trouble wrangling my Xorg build at the moment and I'm pressed for time. > A more complete fix would be quite involved, as modesetting isn't really > properly plumbed to validate GBM's modifiers against KMS planes, and it > doesn't seem like GBM/Mesa/DRI should be responsible for this as noted > above given the general modifier workflow/design. > > Most importantly, options I've considered for fixing from the kernel side: > > -Accept "legacy" modifiers in nouveau in addition to the new modifiers, > though avoid reporting them to userspace as supported to avoid further > proliferation. This is pretty straightforward. I'll need to modify > both the AddFB2 handler (nouveau_validate_decode_mod) and the mode set > plane validation logic (nv50_plane_format_mod_supported), but it should > end up just being a few lines of code. I do think that they should also be reported to userspace if they are accepted. Other users can and do look at the modifier list to see if the buffer is acceptable for a given plane, so the consistency is good here. Of course, in Mesa you would want to prioritise the new modifiers over the legacy ones, and not allocate or return the legacy ones unless that was all you were asked for. This would involve tracking the used modifier explicitly through Mesa, rather than throwing it away at alloc time and then later divining it from the tiling mode. Cheers, Daniel
On 7/2/20 1:22 AM, Daniel Stone wrote: > Hi, > > On Wed, 1 Jul 2020 at 20:45, James Jones <jajones@nvidia.com> wrote: >> OK, I think I see what's going on. In the Xorg modesetting driver, the >> logic is basically: >> >> if (gbm_has_modifiers && DRM_CAP_ADDFB2_MODIFIERS != 0) { >> drmModeAddFB2WithModifiers(..., gbm_bo_get_modifier(bo->gbm)); >> } else { >> drmModeAddFB(...); >> } > > I read this thread expecting to explain the correct behaviour we > implement in Weston and how modesetting needs to be fixed, but ... > that seems OK to me? As long as `gbm_has_modifiers` is a proxy for 'we > used gbm_(bo|surface)_create_with_modifiers to allocate the buffer'. Yes, the hazards of reporting findings before verifying. I now see modesetting does query the DRM-KMS modifiers and attempt to allocate with them if it found any. However, I still see a lot of ways things can go wrong, but I'm not going to share my speculation again until I've actually verified it, which is taking a frustratingly long time. The modesetting driver is not my friend right now. >> There's no attempt to verify the DRM-KMS device supports the modifier, >> but then, why would there be? GBM presumably chose a supported modifier >> at buffer creation time, and we don't know which plane the FB is going >> to be used with yet. GBM doesn't actually ask the kernel which >> modifiers it supports here either though. > > Right, it doesn't ask, because userspace tells it which modifiers to > use. The correct behaviour is to take the list from the KMS > `IN_FORMATS` property and then pass that to > `gbm_(bo|surface)_create_with_modifiers`; GBM must then select from > that list and only that list. If that call does not succeed and Xorg > falls back to `gbm_surface_create`, then it must not call > `gbm_bo_get_modifier` - so that would be a modesetting bug. If that > call does succeed and `gbm_bo_get_modifier` subsequently reports a > modifier which was not in the list, that's a Mesa driver bug. > >> It just goes into Mesa via >> DRI and reports the modifier (unpatched) Mesa chose on its own. Mesa >> just hard-codes the modifiers in its driver backends since its thinking >> in terms of a device's 3D engine, not display. In theory, Mesa's DRI >> drivers could query KMS for supported modifiers if allocating from GBM >> using the non-modifiers path and the SCANOUT flag is set (perhaps some >> drivers do this or its equivalent? Haven't checked.), but that seems >> pretty gnarly and doesn't fix the modifier-based GBM allocation path >> AFAIK. Bit of a mess. > > Two options for GBM users: > * call gbm_*_create_with_modifiers, it succeeds, call > gbm_bo_get_modifier, pass modifier into AddFB > * call gbm_*_create (without modifiers), it succeeds, do not call > gbm_bo_get_modifier, do not pass a modifier into AddFB > > Anything else is a bug in the user. Note that falling back from 1 to 2 > is fine: if `gbm_*_create_with_modifiers()` fails, you can fall back > to the non-modifier path, provided you don't later try to get a > modifier back out. > >> For a quick userspace fix that could probably be pushed out everywhere >> (Only affects Xorg server 1.20+ AFAIK), just retrying >> drmModeAddFB2WithModifiers() without the DRM_MODE_FB_MODIFIERS flag on >> failure should be sufficient. > > This would break other drivers. I think this could be done in a way that wouldn't, though it wouldn't be quite as simple. Let's see what the true root cause is first though. >> Still need to verify as I'm having >> trouble wrangling my Xorg build at the moment and I'm pressed for time. >> A more complete fix would be quite involved, as modesetting isn't really >> properly plumbed to validate GBM's modifiers against KMS planes, and it >> doesn't seem like GBM/Mesa/DRI should be responsible for this as noted >> above given the general modifier workflow/design. >> >> Most importantly, options I've considered for fixing from the kernel side: >> >> -Accept "legacy" modifiers in nouveau in addition to the new modifiers, >> though avoid reporting them to userspace as supported to avoid further >> proliferation. This is pretty straightforward. I'll need to modify >> both the AddFB2 handler (nouveau_validate_decode_mod) and the mode set >> plane validation logic (nv50_plane_format_mod_supported), but it should >> end up just being a few lines of code. > > I do think that they should also be reported to userspace if they are > accepted. Other users can and do look at the modifier list to see if > the buffer is acceptable for a given plane, so the consistency is good > here. Of course, in Mesa you would want to prioritise the new > modifiers over the legacy ones, and not allocate or return the legacy > ones unless that was all you were asked for. This would involve > tracking the used modifier explicitly through Mesa, rather than > throwing it away at alloc time and then later divining it from the > tiling mode. Reporting them as supported is equivalent to reporting support for a memory layout the chips don't actually support (It corresponds to a valid layout on Tegra chips, but not on discrete NV chips). This is what the new modifiers are trying to avoid in the first place: Implying buffers can be shared between these Tegra chips and discrete NV GPUs. Thanks, -James > Cheers, > Daniel >
On 7/2/20 2:14 PM, James Jones wrote: > On 7/2/20 1:22 AM, Daniel Stone wrote: >> Hi, >> >> On Wed, 1 Jul 2020 at 20:45, James Jones <jajones@nvidia.com> wrote: >>> OK, I think I see what's going on. In the Xorg modesetting driver, the >>> logic is basically: >>> >>> if (gbm_has_modifiers && DRM_CAP_ADDFB2_MODIFIERS != 0) { >>> drmModeAddFB2WithModifiers(..., gbm_bo_get_modifier(bo->gbm)); >>> } else { >>> drmModeAddFB(...); >>> } >> >> I read this thread expecting to explain the correct behaviour we >> implement in Weston and how modesetting needs to be fixed, but ... >> that seems OK to me? As long as `gbm_has_modifiers` is a proxy for 'we >> used gbm_(bo|surface)_create_with_modifiers to allocate the buffer'. > > Yes, the hazards of reporting findings before verifying. I now see > modesetting does query the DRM-KMS modifiers and attempt to allocate > with them if it found any. However, I still see a lot of ways things > can go wrong, but I'm not going to share my speculation again until I've > actually verified it, which is taking a frustratingly long time. The > modesetting driver is not my friend right now. OK, several hours of dumb build+config mistakes later, I was actually able to reproduce the failure and walk through things. There is a trivial fix for the issues in the X modesetting driver, working off Daniel Stone's claim that gbm_bo_get_modifier() should only be called when the allocation was made with gbm_bo_create_with_modifiers(). modeset doesn't respect that requirement now in the case that the atomic modesetting path is disabled, which is always the case currently because that path is broken. Respecting that requirement is a half-liner and allows X to start properly. If I force modeset to use the atomic path, X still fails to start with the above fix, validating the second theory I'd had: -Current Mesa nouveau code basically ignores the modifier list passed in unless it is a single modifier requesting linear layout, and goes about allocating whatever layout it sees fit, and succeeds the allocation despite being passed a list of modifiers it knows nothing about. Not great, fixed in my pending patches, obviously doesn't help existing deployed userspace. -Current Mesa nouveau code, when asked what modifier it used for the above allocation, returns one of the "legacy" modifiers nouveau DRM-KMS knows nothing about. -When the modeset driver tries to create an FB for that BO with the returned modifier, the nouveau kernel driver of course refuses. I think it's probably worth fixing the modesetting driver for the reasons Daniel Vetter mentioned. Then if I get my Mesa patches in before a new modesetting driver with working Atomic support is released, there'll be no need for long-term workarounds in the kernel. Down to the real question of what to do in the kernel to support current userspace code: I still think the best fix is to accept the old modifiers but not advertise them. However, Daniel Stone and others, if you think this will actually break userspace in other ways (Could you describe in a bit more detail or point me to test cases if so?), I suppose the only option would be to advertise & accept the old modifiers for now, and I suppose at a config option at some point to phase the old ones out, eventually drop them entirely. This would be unfortunate, because as I mentioned, it could sometimes result in situations where apps think they can share a buffer between two devices but will get garbled data in practice. I've included an initial version of the kernel patch inline below. Needs more testing, but I wanted to share it in case anyone has feedback on the idea, wants to see the general workflow, or wants to help test. >>> There's no attempt to verify the DRM-KMS device supports the modifier, >>> but then, why would there be? GBM presumably chose a supported modifier >>> at buffer creation time, and we don't know which plane the FB is going >>> to be used with yet. GBM doesn't actually ask the kernel which >>> modifiers it supports here either though. >> >> Right, it doesn't ask, because userspace tells it which modifiers to >> use. The correct behaviour is to take the list from the KMS >> `IN_FORMATS` property and then pass that to >> `gbm_(bo|surface)_create_with_modifiers`; GBM must then select from >> that list and only that list. If that call does not succeed and Xorg >> falls back to `gbm_surface_create`, then it must not call >> `gbm_bo_get_modifier` - so that would be a modesetting bug. If that >> call does succeed and `gbm_bo_get_modifier` subsequently reports a >> modifier which was not in the list, that's a Mesa driver bug. >> >>> It just goes into Mesa via >>> DRI and reports the modifier (unpatched) Mesa chose on its own. Mesa >>> just hard-codes the modifiers in its driver backends since its thinking >>> in terms of a device's 3D engine, not display. In theory, Mesa's DRI >>> drivers could query KMS for supported modifiers if allocating from GBM >>> using the non-modifiers path and the SCANOUT flag is set (perhaps some >>> drivers do this or its equivalent? Haven't checked.), but that seems >>> pretty gnarly and doesn't fix the modifier-based GBM allocation path >>> AFAIK. Bit of a mess. >> >> Two options for GBM users: >> * call gbm_*_create_with_modifiers, it succeeds, call >> gbm_bo_get_modifier, pass modifier into AddFB >> * call gbm_*_create (without modifiers), it succeeds, do not call >> gbm_bo_get_modifier, do not pass a modifier into AddFB >> >> Anything else is a bug in the user. Note that falling back from 1 to 2 >> is fine: if `gbm_*_create_with_modifiers()` fails, you can fall back >> to the non-modifier path, provided you don't later try to get a >> modifier back out. >> >>> For a quick userspace fix that could probably be pushed out everywhere >>> (Only affects Xorg server 1.20+ AFAIK), just retrying >>> drmModeAddFB2WithModifiers() without the DRM_MODE_FB_MODIFIERS flag on >>> failure should be sufficient. >> >> This would break other drivers. > > I think this could be done in a way that wouldn't, though it wouldn't be > quite as simple. Let's see what the true root cause is first though. > >>> Still need to verify as I'm having >>> trouble wrangling my Xorg build at the moment and I'm pressed for time. >>> A more complete fix would be quite involved, as modesetting isn't really >>> properly plumbed to validate GBM's modifiers against KMS planes, and it >>> doesn't seem like GBM/Mesa/DRI should be responsible for this as noted >>> above given the general modifier workflow/design. >>> >>> Most importantly, options I've considered for fixing from the kernel >>> side: >>> >>> -Accept "legacy" modifiers in nouveau in addition to the new modifiers, >>> though avoid reporting them to userspace as supported to avoid further >>> proliferation. This is pretty straightforward. I'll need to modify >>> both the AddFB2 handler (nouveau_validate_decode_mod) and the mode set >>> plane validation logic (nv50_plane_format_mod_supported), but it should >>> end up just being a few lines of code. >> >> I do think that they should also be reported to userspace if they are >> accepted. Other users can and do look at the modifier list to see if >> the buffer is acceptable for a given plane, so the consistency is good >> here. Of course, in Mesa you would want to prioritise the new >> modifiers over the legacy ones, and not allocate or return the legacy >> ones unless that was all you were asked for. This would involve >> tracking the used modifier explicitly through Mesa, rather than >> throwing it away at alloc time and then later divining it from the >> tiling mode. > > Reporting them as supported is equivalent to reporting support for a > memory layout the chips don't actually support (It corresponds to a > valid layout on Tegra chips, but not on discrete NV chips). This is > what the new modifiers are trying to avoid in the first place: Implying > buffers can be shared between these Tegra chips and discrete NV GPUs. > > Thanks, > -James > >> Cheers, >> Daniel >> nouveau: Accept 'legacy' format modifiers Accept the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK() family of modifiers to handle broken userspace Xorg modesetting and Mesa drivers. --- drivers/gpu/drm/nouveau/nouveau_display.c | 26 +++++++++++++++++++++-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index 496c4621cc78..31543086254b 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -191,8 +191,14 @@ nouveau_decode_mod(struct nouveau_drm *drm, uint32_t *tile_mode, uint8_t *kind) { + struct nouveau_display *disp = nouveau_display(drm->dev); BUG_ON(!tile_mode || !kind); + if ((modifier & (0xffull << 12)) == 0ull) { + /* Legacy modifier. Translate to this device's 'kind.' */ + modifier |= disp->format_modifiers[0] & (0xffull << 12); + } + if (modifier == DRM_FORMAT_MOD_LINEAR) { /* tile_mode will not be used in this case */ *tile_mode = 0; @@ -227,6 +233,16 @@ nouveau_framebuffer_get_layout(struct drm_framebuffer *fb, } } +static const u64 legacy_modifiers[] = { + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5), + DRM_FORMAT_MOD_INVALID +}; + static int nouveau_validate_decode_mod(struct nouveau_drm *drm, uint64_t modifier, @@ -247,8 +263,14 @@ nouveau_validate_decode_mod(struct nouveau_drm *drm, (disp->format_modifiers[mod] != modifier); mod++); - if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) - return -EINVAL; + if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) { + for (mod = 0; + (legacy_modifiers[mod] != DRM_FORMAT_MOD_INVALID) && + (legacy_modifiers[mod] != modifier); + mod++); + if (legacy_modifiers[mod] == DRM_FORMAT_MOD_INVALID) + return -EINVAL; + } nouveau_decode_mod(drm, modifier, tile_mode, kind);
On Fri, Jul 3, 2020 at 8:01 AM James Jones <jajones@nvidia.com> wrote: > > On 7/2/20 2:14 PM, James Jones wrote: > > On 7/2/20 1:22 AM, Daniel Stone wrote: > >> Hi, > >> > >> On Wed, 1 Jul 2020 at 20:45, James Jones <jajones@nvidia.com> wrote: > >>> OK, I think I see what's going on. In the Xorg modesetting driver, the > >>> logic is basically: > >>> > >>> if (gbm_has_modifiers && DRM_CAP_ADDFB2_MODIFIERS != 0) { > >>> drmModeAddFB2WithModifiers(..., gbm_bo_get_modifier(bo->gbm)); > >>> } else { > >>> drmModeAddFB(...); > >>> } > >> > >> I read this thread expecting to explain the correct behaviour we > >> implement in Weston and how modesetting needs to be fixed, but ... > >> that seems OK to me? As long as `gbm_has_modifiers` is a proxy for 'we > >> used gbm_(bo|surface)_create_with_modifiers to allocate the buffer'. > > > > Yes, the hazards of reporting findings before verifying. I now see > > modesetting does query the DRM-KMS modifiers and attempt to allocate > > with them if it found any. However, I still see a lot of ways things > > can go wrong, but I'm not going to share my speculation again until I've > > actually verified it, which is taking a frustratingly long time. The > > modesetting driver is not my friend right now. > > OK, several hours of dumb build+config mistakes later, I was actually > able to reproduce the failure and walk through things. There is a > trivial fix for the issues in the X modesetting driver, working off > Daniel Stone's claim that gbm_bo_get_modifier() should only be called > when the allocation was made with gbm_bo_create_with_modifiers(). > modeset doesn't respect that requirement now in the case that the atomic > modesetting path is disabled, which is always the case currently because > that path is broken. Respecting that requirement is a half-liner and > allows X to start properly. > > If I force modeset to use the atomic path, X still fails to start with > the above fix, validating the second theory I'd had: > > -Current Mesa nouveau code basically ignores the modifier list passed in > unless it is a single modifier requesting linear layout, and goes about > allocating whatever layout it sees fit, and succeeds the allocation > despite being passed a list of modifiers it knows nothing about. Not > great, fixed in my pending patches, obviously doesn't help existing > deployed userspace. > > -Current Mesa nouveau code, when asked what modifier it used for the > above allocation, returns one of the "legacy" modifiers nouveau DRM-KMS > knows nothing about. > > -When the modeset driver tries to create an FB for that BO with the > returned modifier, the nouveau kernel driver of course refuses. > > I think it's probably worth fixing the modesetting driver for the > reasons Daniel Vetter mentioned. Then if I get my Mesa patches in > before a new modesetting driver with working Atomic support is released, > there'll be no need for long-term workarounds in the kernel. So atm there's 0 working -modesetting with atomic because all of them are caught up by the kernel's check to refuse to hand out atomic support to X. I think we'd also need a patch to pump the atomic setcap to 2 (which should bypass the kernel's hack), plus of course someone needs to release 1.21. > Down to the real question of what to do in the kernel to support current > userspace code: I still think the best fix is to accept the old > modifiers but not advertise them. However, Daniel Stone and others, if > you think this will actually break userspace in other ways (Could you > describe in a bit more detail or point me to test cases if so?), I > suppose the only option would be to advertise & accept the old modifiers > for now, and I suppose at a config option at some point to phase the old > ones out, eventually drop them entirely. This would be unfortunate, > because as I mentioned, it could sometimes result in situations where > apps think they can share a buffer between two devices but will get > garbled data in practice. > > I've included an initial version of the kernel patch inline below. > Needs more testing, but I wanted to share it in case anyone has feedback > on the idea, wants to see the general workflow, or wants to help test. I think the only risk I'm seeing if nouveau currently also allocates formats (sometimes, maybe) that you then can't always scan out. If that's not the case, then silently allowing old modifiers to keep things working sounds like the best option. Otherwise sounds all reasonable, maybe more comments and hints to what exactly is broken in userspace for the real patch (once tested and all). -Daniel > >>> There's no attempt to verify the DRM-KMS device supports the modifier, > >>> but then, why would there be? GBM presumably chose a supported modifier > >>> at buffer creation time, and we don't know which plane the FB is going > >>> to be used with yet. GBM doesn't actually ask the kernel which > >>> modifiers it supports here either though. > >> > >> Right, it doesn't ask, because userspace tells it which modifiers to > >> use. The correct behaviour is to take the list from the KMS > >> `IN_FORMATS` property and then pass that to > >> `gbm_(bo|surface)_create_with_modifiers`; GBM must then select from > >> that list and only that list. If that call does not succeed and Xorg > >> falls back to `gbm_surface_create`, then it must not call > >> `gbm_bo_get_modifier` - so that would be a modesetting bug. If that > >> call does succeed and `gbm_bo_get_modifier` subsequently reports a > >> modifier which was not in the list, that's a Mesa driver bug. > >> > >>> It just goes into Mesa via > >>> DRI and reports the modifier (unpatched) Mesa chose on its own. Mesa > >>> just hard-codes the modifiers in its driver backends since its thinking > >>> in terms of a device's 3D engine, not display. In theory, Mesa's DRI > >>> drivers could query KMS for supported modifiers if allocating from GBM > >>> using the non-modifiers path and the SCANOUT flag is set (perhaps some > >>> drivers do this or its equivalent? Haven't checked.), but that seems > >>> pretty gnarly and doesn't fix the modifier-based GBM allocation path > >>> AFAIK. Bit of a mess. > >> > >> Two options for GBM users: > >> * call gbm_*_create_with_modifiers, it succeeds, call > >> gbm_bo_get_modifier, pass modifier into AddFB > >> * call gbm_*_create (without modifiers), it succeeds, do not call > >> gbm_bo_get_modifier, do not pass a modifier into AddFB > >> > >> Anything else is a bug in the user. Note that falling back from 1 to 2 > >> is fine: if `gbm_*_create_with_modifiers()` fails, you can fall back > >> to the non-modifier path, provided you don't later try to get a > >> modifier back out. > >> > >>> For a quick userspace fix that could probably be pushed out everywhere > >>> (Only affects Xorg server 1.20+ AFAIK), just retrying > >>> drmModeAddFB2WithModifiers() without the DRM_MODE_FB_MODIFIERS flag on > >>> failure should be sufficient. > >> > >> This would break other drivers. > > > > I think this could be done in a way that wouldn't, though it wouldn't be > > quite as simple. Let's see what the true root cause is first though. > > > >>> Still need to verify as I'm having > >>> trouble wrangling my Xorg build at the moment and I'm pressed for time. > >>> A more complete fix would be quite involved, as modesetting isn't really > >>> properly plumbed to validate GBM's modifiers against KMS planes, and it > >>> doesn't seem like GBM/Mesa/DRI should be responsible for this as noted > >>> above given the general modifier workflow/design. > >>> > >>> Most importantly, options I've considered for fixing from the kernel > >>> side: > >>> > >>> -Accept "legacy" modifiers in nouveau in addition to the new modifiers, > >>> though avoid reporting them to userspace as supported to avoid further > >>> proliferation. This is pretty straightforward. I'll need to modify > >>> both the AddFB2 handler (nouveau_validate_decode_mod) and the mode set > >>> plane validation logic (nv50_plane_format_mod_supported), but it should > >>> end up just being a few lines of code. > >> > >> I do think that they should also be reported to userspace if they are > >> accepted. Other users can and do look at the modifier list to see if > >> the buffer is acceptable for a given plane, so the consistency is good > >> here. Of course, in Mesa you would want to prioritise the new > >> modifiers over the legacy ones, and not allocate or return the legacy > >> ones unless that was all you were asked for. This would involve > >> tracking the used modifier explicitly through Mesa, rather than > >> throwing it away at alloc time and then later divining it from the > >> tiling mode. > > > > Reporting them as supported is equivalent to reporting support for a > > memory layout the chips don't actually support (It corresponds to a > > valid layout on Tegra chips, but not on discrete NV chips). This is > > what the new modifiers are trying to avoid in the first place: Implying > > buffers can be shared between these Tegra chips and discrete NV GPUs. > > > > Thanks, > > -James > > > >> Cheers, > >> Daniel > >> > > nouveau: Accept 'legacy' format modifiers > > Accept the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK() > family of modifiers to handle broken userspace > Xorg modesetting and Mesa drivers. > --- > drivers/gpu/drm/nouveau/nouveau_display.c | 26 +++++++++++++++++++++-- > 1 file changed, 24 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c > b/drivers/gpu/drm/nouveau/nouveau_display.c > index 496c4621cc78..31543086254b 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_display.c > +++ b/drivers/gpu/drm/nouveau/nouveau_display.c > @@ -191,8 +191,14 @@ nouveau_decode_mod(struct nouveau_drm *drm, > uint32_t *tile_mode, > uint8_t *kind) > { > + struct nouveau_display *disp = nouveau_display(drm->dev); > BUG_ON(!tile_mode || !kind); > > + if ((modifier & (0xffull << 12)) == 0ull) { > + /* Legacy modifier. Translate to this device's 'kind.' */ > + modifier |= disp->format_modifiers[0] & (0xffull << 12); > + } > + > if (modifier == DRM_FORMAT_MOD_LINEAR) { > /* tile_mode will not be used in this case */ > *tile_mode = 0; > @@ -227,6 +233,16 @@ nouveau_framebuffer_get_layout(struct > drm_framebuffer *fb, > } > } > > +static const u64 legacy_modifiers[] = { > + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0), > + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1), > + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2), > + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3), > + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4), > + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5), > + DRM_FORMAT_MOD_INVALID > +}; > + > static int > nouveau_validate_decode_mod(struct nouveau_drm *drm, > uint64_t modifier, > @@ -247,8 +263,14 @@ nouveau_validate_decode_mod(struct nouveau_drm *drm, > (disp->format_modifiers[mod] != modifier); > mod++); > > - if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) > - return -EINVAL; > + if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) { > + for (mod = 0; > + (legacy_modifiers[mod] != DRM_FORMAT_MOD_INVALID) && > + (legacy_modifiers[mod] != modifier); > + mod++); > + if (legacy_modifiers[mod] == DRM_FORMAT_MOD_INVALID) > + return -EINVAL; > + } > > nouveau_decode_mod(drm, modifier, tile_mode, kind); > > -- > 2.17.1 >
How are we going with a fix for this regression I can commit? Dave.
Still testing. I'll get a Sign-off version out this week unless I find a problem. Thanks, -James On 7/12/20 6:37 PM, Dave Airlie wrote: > How are we going with a fix for this regression I can commit? > > Dave. >
Hi James, I don't know if you knew, but on the Jetson nano we had the issue for quite some time, that GLX/EGL through mesa on X was broken due to some fix in mesa related to modifiers. And I was wondering if the overall state just caused the issue we saw here and wanted to know what branches/patches I needed for the various projects to see if the work you have been doing since the last upstream nouveau regression would be of any help here? Mind pointing me towards everything I'd need to check that? I'd really like to fix this, but didn't have the time to investigate what the core problem here was, but I think it's very likely that a fixed/improved modifier support could actually fix it as well. Alternately I'd like to move to kmsro in mesa as this fixes it as well, but that could just be by coincidence and would break other devices.. Thanks On Tue, Jul 14, 2020 at 4:32 PM James Jones <jajones@nvidia.com> wrote: > > Still testing. I'll get a Sign-off version out this week unless I find > a problem. > > Thanks, > -James > > On 7/12/20 6:37 PM, Dave Airlie wrote: > > How are we going with a fix for this regression I can commit? > > > > Dave. > > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel >
Sorry for the slow reply here as well. I've been in the process of rebasing and reworking the userspace patches. I'm not clear my changes will address the Jetson Nano issue, but if you'd like to try them, the latest userspace changes are available here: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3724 And the tegra-drm kernel patches are here: https://patchwork.ozlabs.org/project/linux-tegra/patch/20191217005205.2573-1-jajones@nvidia.com/ Those + the kernel changes addressed in this thread are everything I had outstanding. Thanks, -James On 8/4/20 1:58 AM, Karol Herbst wrote: > Hi James, > > I don't know if you knew, but on the Jetson nano we had the issue for > quite some time, that GLX/EGL through mesa on X was broken due to some > fix in mesa related to modifiers. > > And I was wondering if the overall state just caused the issue we saw > here and wanted to know what branches/patches I needed for the various > projects to see if the work you have been doing since the last > upstream nouveau regression would be of any help here? > > Mind pointing me towards everything I'd need to check that? > > I'd really like to fix this, but didn't have the time to investigate > what the core problem here was, but I think it's very likely that a > fixed/improved modifier support could actually fix it as well. > Alternately I'd like to move to kmsro in mesa as this fixes it as > well, but that could just be by coincidence and would break other > devices.. > > Thanks > > On Tue, Jul 14, 2020 at 4:32 PM James Jones <jajones@nvidia.com> wrote: >> >> Still testing. I'll get a Sign-off version out this week unless I find >> a problem. >> >> Thanks, >> -James >> >> On 7/12/20 6:37 PM, Dave Airlie wrote: >>> How are we going with a fix for this regression I can commit? >>> >>> Dave. >>> >> _______________________________________________ >> dri-devel mailing list >> dri-devel@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/dri-devel >> >
On Wed, Aug 12, 2020 at 2:19 AM James Jones <jajones@nvidia.com> wrote: > > Sorry for the slow reply here as well. I've been in the process of > rebasing and reworking the userspace patches. I'm not clear my changes > will address the Jetson Nano issue, but if you'd like to try them, the > latest userspace changes are available here: > > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3724 > > And the tegra-drm kernel patches are here: > > > https://patchwork.ozlabs.org/project/linux-tegra/patch/20191217005205.2573-1-jajones@nvidia.com/ > > Those + the kernel changes addressed in this thread are everything I had > outstanding. > I don't know if that's caused by your changes or not, but now the assert I hit is a different one pointing out that nvc0_miptree_select_best_modifier fails in a certain case and returns MOD_INVALID... anyway, it seems like with your patches applied it's now way easier to debug and figure out what's going wrong, so maybe I can figure it out now :) > Thanks, > -James > > On 8/4/20 1:58 AM, Karol Herbst wrote: > > Hi James, > > > > I don't know if you knew, but on the Jetson nano we had the issue for > > quite some time, that GLX/EGL through mesa on X was broken due to some > > fix in mesa related to modifiers. > > > > And I was wondering if the overall state just caused the issue we saw > > here and wanted to know what branches/patches I needed for the various > > projects to see if the work you have been doing since the last > > upstream nouveau regression would be of any help here? > > > > Mind pointing me towards everything I'd need to check that? > > > > I'd really like to fix this, but didn't have the time to investigate > > what the core problem here was, but I think it's very likely that a > > fixed/improved modifier support could actually fix it as well. > > Alternately I'd like to move to kmsro in mesa as this fixes it as > > well, but that could just be by coincidence and would break other > > devices.. > > > > Thanks > > > > On Tue, Jul 14, 2020 at 4:32 PM James Jones <jajones@nvidia.com> wrote: > >> > >> Still testing. I'll get a Sign-off version out this week unless I find > >> a problem. > >> > >> Thanks, > >> -James > >> > >> On 7/12/20 6:37 PM, Dave Airlie wrote: > >>> How are we going with a fix for this regression I can commit? > >>> > >>> Dave. > >>> > >> _______________________________________________ > >> dri-devel mailing list > >> dri-devel@lists.freedesktop.org > >> https://lists.freedesktop.org/mailman/listinfo/dri-devel > >> > > >
On Wed, Aug 12, 2020 at 12:27 PM Karol Herbst <kherbst@redhat.com> wrote: > > On Wed, Aug 12, 2020 at 2:19 AM James Jones <jajones@nvidia.com> wrote: > > > > Sorry for the slow reply here as well. I've been in the process of > > rebasing and reworking the userspace patches. I'm not clear my changes > > will address the Jetson Nano issue, but if you'd like to try them, the > > latest userspace changes are available here: > > > > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3724 > > > > And the tegra-drm kernel patches are here: > > > > > > https://patchwork.ozlabs.org/project/linux-tegra/patch/20191217005205.2573-1-jajones@nvidia.com/ > > > > Those + the kernel changes addressed in this thread are everything I had > > outstanding. > > > > I don't know if that's caused by your changes or not, but now the > assert I hit is a different one pointing out that > nvc0_miptree_select_best_modifier fails in a certain case and returns > MOD_INVALID... anyway, it seems like with your patches applied it's > now way easier to debug and figure out what's going wrong, so maybe I > can figure it out now :) > collected some information which might help to track it down. src/gallium/frontends/dri/dri2.c:648 is the assert hit: assert(*zsbuf) templ is {reference = {count = 0}, width0 = 300, height0 = 300, depth0 = 1, array_size = 1, format = PIPE_FORMAT_Z24X8_UNORM, target = PIPE_TEXTURE_2D, last_level = 0, nr_samples = 0, nr_storage_samples = 0, usage = 0, bind = 1, flags = 0, next = 0x0, screen = 0x0} inside tegra_screen_resource_create modifier says DRM_FORMAT_MOD_INVALID as template->bind is 1 and nvc0_miptree_select_best_modifier returns DRM_FORMAT_MOD_INVALID, so the call just returns NULL leading to the assert. Btw, this is on Xorg-1.20.8-1.fc32.aarch64 with glxgears. > > Thanks, > > -James > > > > On 8/4/20 1:58 AM, Karol Herbst wrote: > > > Hi James, > > > > > > I don't know if you knew, but on the Jetson nano we had the issue for > > > quite some time, that GLX/EGL through mesa on X was broken due to some > > > fix in mesa related to modifiers. > > > > > > And I was wondering if the overall state just caused the issue we saw > > > here and wanted to know what branches/patches I needed for the various > > > projects to see if the work you have been doing since the last > > > upstream nouveau regression would be of any help here? > > > > > > Mind pointing me towards everything I'd need to check that? > > > > > > I'd really like to fix this, but didn't have the time to investigate > > > what the core problem here was, but I think it's very likely that a > > > fixed/improved modifier support could actually fix it as well. > > > Alternately I'd like to move to kmsro in mesa as this fixes it as > > > well, but that could just be by coincidence and would break other > > > devices.. > > > > > > Thanks > > > > > > On Tue, Jul 14, 2020 at 4:32 PM James Jones <jajones@nvidia.com> wrote: > > >> > > >> Still testing. I'll get a Sign-off version out this week unless I find > > >> a problem. > > >> > > >> Thanks, > > >> -James > > >> > > >> On 7/12/20 6:37 PM, Dave Airlie wrote: > > >>> How are we going with a fix for this regression I can commit? > > >>> > > >>> Dave. > > >>> > > >> _______________________________________________ > > >> dri-devel mailing list > > >> dri-devel@lists.freedesktop.org > > >> https://lists.freedesktop.org/mailman/listinfo/dri-devel > > >> > > > > >
On Wed, Aug 12, 2020 at 12:43 PM Karol Herbst <kherbst@redhat.com> wrote: > > On Wed, Aug 12, 2020 at 12:27 PM Karol Herbst <kherbst@redhat.com> wrote: > > > > On Wed, Aug 12, 2020 at 2:19 AM James Jones <jajones@nvidia.com> wrote: > > > > > > Sorry for the slow reply here as well. I've been in the process of > > > rebasing and reworking the userspace patches. I'm not clear my changes > > > will address the Jetson Nano issue, but if you'd like to try them, the > > > latest userspace changes are available here: > > > > > > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3724 > > > > > > And the tegra-drm kernel patches are here: > > > > > > > > > https://patchwork.ozlabs.org/project/linux-tegra/patch/20191217005205.2573-1-jajones@nvidia.com/ > > > > > > Those + the kernel changes addressed in this thread are everything I had > > > outstanding. > > > > > > > I don't know if that's caused by your changes or not, but now the > > assert I hit is a different one pointing out that > > nvc0_miptree_select_best_modifier fails in a certain case and returns > > MOD_INVALID... anyway, it seems like with your patches applied it's > > now way easier to debug and figure out what's going wrong, so maybe I > > can figure it out now :) > > > > collected some information which might help to track it down. > > src/gallium/frontends/dri/dri2.c:648 is the assert hit: assert(*zsbuf) > > templ is {reference = {count = 0}, width0 = 300, height0 = 300, depth0 > = 1, array_size = 1, format = PIPE_FORMAT_Z24X8_UNORM, target = > PIPE_TEXTURE_2D, last_level = 0, nr_samples = 0, nr_storage_samples = > 0, usage = 0, bind = 1, flags = 0, next = 0x0, screen = 0x0} > > inside tegra_screen_resource_create modifier says > DRM_FORMAT_MOD_INVALID as template->bind is 1 > > and nvc0_miptree_select_best_modifier returns DRM_FORMAT_MOD_INVALID, > so the call just returns NULL leading to the assert. > > Btw, this is on Xorg-1.20.8-1.fc32.aarch64 with glxgears. > So I digged a bit deeper and here is what tripps it of: when the context gets made current, the normal framebuffer validation and render buffer allocation is done, but we end up inside tegra_screen_resource_create at some point with PIPE_BIND_SCANOUT set in template->bind. Now the tegra driver forces the DRM_FORMAT_MOD_LINEAR modifier and calls into resource_create_with_modifiers. If it wouldn't do that, nouveau would allocate a tiled buffer, with that it's linear and we at some point end up with an assert about a depth_stencil buffer being there even though it shouldn't. If I always use DRM_FORMAT_MOD_INVALID in tegra_screen_resource_create, things just work. That's kind of the cause I pinpointed the issue down to. But I have no idea what's supposed to happen and what the actual bug is. > > > Thanks, > > > -James > > > > > > On 8/4/20 1:58 AM, Karol Herbst wrote: > > > > Hi James, > > > > > > > > I don't know if you knew, but on the Jetson nano we had the issue for > > > > quite some time, that GLX/EGL through mesa on X was broken due to some > > > > fix in mesa related to modifiers. > > > > > > > > And I was wondering if the overall state just caused the issue we saw > > > > here and wanted to know what branches/patches I needed for the various > > > > projects to see if the work you have been doing since the last > > > > upstream nouveau regression would be of any help here? > > > > > > > > Mind pointing me towards everything I'd need to check that? > > > > > > > > I'd really like to fix this, but didn't have the time to investigate > > > > what the core problem here was, but I think it's very likely that a > > > > fixed/improved modifier support could actually fix it as well. > > > > Alternately I'd like to move to kmsro in mesa as this fixes it as > > > > well, but that could just be by coincidence and would break other > > > > devices.. > > > > > > > > Thanks > > > > > > > > On Tue, Jul 14, 2020 at 4:32 PM James Jones <jajones@nvidia.com> wrote: > > > >> > > > >> Still testing. I'll get a Sign-off version out this week unless I find > > > >> a problem. > > > >> > > > >> Thanks, > > > >> -James > > > >> > > > >> On 7/12/20 6:37 PM, Dave Airlie wrote: > > > >>> How are we going with a fix for this regression I can commit? > > > >>> > > > >>> Dave. > > > >>> > > > >> _______________________________________________ > > > >> dri-devel mailing list > > > >> dri-devel@lists.freedesktop.org > > > >> https://lists.freedesktop.org/mailman/listinfo/dri-devel > > > >> > > > > > > >
On Wed, Aug 12, 2020 at 8:24 AM Karol Herbst <kherbst@redhat.com> wrote: > > On Wed, Aug 12, 2020 at 12:43 PM Karol Herbst <kherbst@redhat.com> wrote: > > > > On Wed, Aug 12, 2020 at 12:27 PM Karol Herbst <kherbst@redhat.com> wrote: > > > > > > On Wed, Aug 12, 2020 at 2:19 AM James Jones <jajones@nvidia.com> wrote: > > > > > > > > Sorry for the slow reply here as well. I've been in the process of > > > > rebasing and reworking the userspace patches. I'm not clear my changes > > > > will address the Jetson Nano issue, but if you'd like to try them, the > > > > latest userspace changes are available here: > > > > > > > > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3724 > > > > > > > > And the tegra-drm kernel patches are here: > > > > > > > > > > > > https://patchwork.ozlabs.org/project/linux-tegra/patch/20191217005205.2573-1-jajones@nvidia.com/ > > > > > > > > Those + the kernel changes addressed in this thread are everything I had > > > > outstanding. > > > > > > > > > > I don't know if that's caused by your changes or not, but now the > > > assert I hit is a different one pointing out that > > > nvc0_miptree_select_best_modifier fails in a certain case and returns > > > MOD_INVALID... anyway, it seems like with your patches applied it's > > > now way easier to debug and figure out what's going wrong, so maybe I > > > can figure it out now :) > > > > > > > collected some information which might help to track it down. > > > > src/gallium/frontends/dri/dri2.c:648 is the assert hit: assert(*zsbuf) > > > > templ is {reference = {count = 0}, width0 = 300, height0 = 300, depth0 > > = 1, array_size = 1, format = PIPE_FORMAT_Z24X8_UNORM, target = > > PIPE_TEXTURE_2D, last_level = 0, nr_samples = 0, nr_storage_samples = > > 0, usage = 0, bind = 1, flags = 0, next = 0x0, screen = 0x0} > > > > inside tegra_screen_resource_create modifier says > > DRM_FORMAT_MOD_INVALID as template->bind is 1 > > > > and nvc0_miptree_select_best_modifier returns DRM_FORMAT_MOD_INVALID, > > so the call just returns NULL leading to the assert. > > > > Btw, this is on Xorg-1.20.8-1.fc32.aarch64 with glxgears. > > > > So I digged a bit deeper and here is what tripps it of: > > when the context gets made current, the normal framebuffer validation > and render buffer allocation is done, but we end up inside > tegra_screen_resource_create at some point with PIPE_BIND_SCANOUT set > in template->bind. Now the tegra driver forces the > DRM_FORMAT_MOD_LINEAR modifier and calls into > resource_create_with_modifiers. > > If it wouldn't do that, nouveau would allocate a tiled buffer, with > that it's linear and we at some point end up with an assert about a > depth_stencil buffer being there even though it shouldn't. If I always > use DRM_FORMAT_MOD_INVALID in tegra_screen_resource_create, things > just work. > > That's kind of the cause I pinpointed the issue down to. But I have no > idea what's supposed to happen and what the actual bug is. Yeah, the bug with tegra has always been "trying to render to linear color + tiled depth", which the hardware plain doesn't support. (And linear depth isn't a thing.) Question is whether what it's doing necessary. PIPE_BIND_SCANOUT (/linear) requirements are needed for DRI2 to work (well, maybe not in theory, but at least in practice the nouveau ddx expects linear buffers). However tegra operates on a more DRI3-like basis, so with "client" allocations, tiled should work OK as long as there's something in tegra to copy it to linear when necessary? -ilia
On Wed, Aug 12, 2020 at 12:43:17PM +0200, Karol Herbst wrote: > On Wed, Aug 12, 2020 at 12:27 PM Karol Herbst <kherbst@redhat.com> wrote: > > > > On Wed, Aug 12, 2020 at 2:19 AM James Jones <jajones@nvidia.com> wrote: > > > > > > Sorry for the slow reply here as well. I've been in the process of > > > rebasing and reworking the userspace patches. I'm not clear my changes > > > will address the Jetson Nano issue, but if you'd like to try them, the > > > latest userspace changes are available here: > > > > > > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3724 > > > > > > And the tegra-drm kernel patches are here: > > > > > > > > > https://patchwork.ozlabs.org/project/linux-tegra/patch/20191217005205.2573-1-jajones@nvidia.com/ > > > > > > Those + the kernel changes addressed in this thread are everything I had > > > outstanding. > > > > > > > I don't know if that's caused by your changes or not, but now the > > assert I hit is a different one pointing out that > > nvc0_miptree_select_best_modifier fails in a certain case and returns > > MOD_INVALID... anyway, it seems like with your patches applied it's > > now way easier to debug and figure out what's going wrong, so maybe I > > can figure it out now :) > > > > collected some information which might help to track it down. > > src/gallium/frontends/dri/dri2.c:648 is the assert hit: assert(*zsbuf) > > templ is {reference = {count = 0}, width0 = 300, height0 = 300, depth0 > = 1, array_size = 1, format = PIPE_FORMAT_Z24X8_UNORM, target = > PIPE_TEXTURE_2D, last_level = 0, nr_samples = 0, nr_storage_samples = > 0, usage = 0, bind = 1, flags = 0, next = 0x0, screen = 0x0} > > inside tegra_screen_resource_create modifier says > DRM_FORMAT_MOD_INVALID as template->bind is 1 > > and nvc0_miptree_select_best_modifier returns DRM_FORMAT_MOD_INVALID, > so the call just returns NULL leading to the assert. > > Btw, this is on Xorg-1.20.8-1.fc32.aarch64 with glxgears. Hi Karol, I'm not sure if I'm doing something wrong, but I can't seem to reproduce these assertions at all. I've got a debug Mesa build from today as well as an X server build from today and if I run glxgears it works just fine for me. This is on Jetson TX1, but I don't think there's any difference to Jetson Nano in that regard. I'll try to give this a try on Jetson Nano as well, just to make sure. I can also try to pull in James' patches to see if they change anything for me. However, perhaps we can compare notes on what exactly your configuration is so that perhaps I can reproduce and take a closer look at what's going on. My Mesa build uses the following configuration: $ meson --prefix /usr --libexecdir /usr/lib --buildtype debug -Dgles1=false \ -Dgallium-drivers=nouveau,swrast,tegra -Dgallium-opencl=standalone \ -Dvulkan-drivers='' -Dplatforms=wayland,x11,drm,surfaceless \ -Dbuild-tests=true -Dtexture-float=true -Ddri-drivers='' \ -Dgallium-omx=disabled -Dllvm=true And here's what I use for X: $ meson --prefix /usr --libexecdir /usr/lib \ --libexecdir /usr/lib/xorg-server -Dxdmcp=false I've stripped out some cross-compilation boilerplate there because that shouldn't be relevant. Do you see anything in there that I'm missing and which might be causing the issue not to happen for me? Also, what's the window manager that you use? I use TWM (for simplicity) and I suspect that's not what you use, so perhaps this is relevant somehow as well? Thierry
On Wed, Aug 12, 2020 at 5:05 PM Thierry Reding <thierry.reding@gmail.com> wrote: > > On Wed, Aug 12, 2020 at 12:43:17PM +0200, Karol Herbst wrote: > > On Wed, Aug 12, 2020 at 12:27 PM Karol Herbst <kherbst@redhat.com> wrote: > > > > > > On Wed, Aug 12, 2020 at 2:19 AM James Jones <jajones@nvidia.com> wrote: > > > > > > > > Sorry for the slow reply here as well. I've been in the process of > > > > rebasing and reworking the userspace patches. I'm not clear my changes > > > > will address the Jetson Nano issue, but if you'd like to try them, the > > > > latest userspace changes are available here: > > > > > > > > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3724 > > > > > > > > And the tegra-drm kernel patches are here: > > > > > > > > > > > > https://patchwork.ozlabs.org/project/linux-tegra/patch/20191217005205.2573-1-jajones@nvidia.com/ > > > > > > > > Those + the kernel changes addressed in this thread are everything I had > > > > outstanding. > > > > > > > > > > I don't know if that's caused by your changes or not, but now the > > > assert I hit is a different one pointing out that > > > nvc0_miptree_select_best_modifier fails in a certain case and returns > > > MOD_INVALID... anyway, it seems like with your patches applied it's > > > now way easier to debug and figure out what's going wrong, so maybe I > > > can figure it out now :) > > > > > > > collected some information which might help to track it down. > > > > src/gallium/frontends/dri/dri2.c:648 is the assert hit: assert(*zsbuf) > > > > templ is {reference = {count = 0}, width0 = 300, height0 = 300, depth0 > > = 1, array_size = 1, format = PIPE_FORMAT_Z24X8_UNORM, target = > > PIPE_TEXTURE_2D, last_level = 0, nr_samples = 0, nr_storage_samples = > > 0, usage = 0, bind = 1, flags = 0, next = 0x0, screen = 0x0} > > > > inside tegra_screen_resource_create modifier says > > DRM_FORMAT_MOD_INVALID as template->bind is 1 > > > > and nvc0_miptree_select_best_modifier returns DRM_FORMAT_MOD_INVALID, > > so the call just returns NULL leading to the assert. > > > > Btw, this is on Xorg-1.20.8-1.fc32.aarch64 with glxgears. > > Hi Karol, > > I'm not sure if I'm doing something wrong, but I can't seem to reproduce > these assertions at all. I've got a debug Mesa build from today as well > as an X server build from today and if I run glxgears it works just fine > for me. This is on Jetson TX1, but I don't think there's any difference > to Jetson Nano in that regard. I'll try to give this a try on Jetson > Nano as well, just to make sure. I can also try to pull in James' > patches to see if they change anything for me. > > However, perhaps we can compare notes on what exactly your configuration > is so that perhaps I can reproduce and take a closer look at what's > going on. > > My Mesa build uses the following configuration: > > $ meson --prefix /usr --libexecdir /usr/lib --buildtype debug -Dgles1=false \ > -Dgallium-drivers=nouveau,swrast,tegra -Dgallium-opencl=standalone \ > -Dvulkan-drivers='' -Dplatforms=wayland,x11,drm,surfaceless \ > -Dbuild-tests=true -Dtexture-float=true -Ddri-drivers='' \ > -Dgallium-omx=disabled -Dllvm=true > > And here's what I use for X: > > $ meson --prefix /usr --libexecdir /usr/lib \ > --libexecdir /usr/lib/xorg-server -Dxdmcp=false > > I've stripped out some cross-compilation boilerplate there because that > shouldn't be relevant. Do you see anything in there that I'm missing and > which might be causing the issue not to happen for me? > > Also, what's the window manager that you use? I use TWM (for simplicity) > and I suspect that's not what you use, so perhaps this is relevant > somehow as well? > I don't use any at all, just plain X. Anyway, for software versions: kernel-5.8.0 + patch James refered to above Xorg-1.20.8-1.fc32.aarch64 (just the normal fc32 build) mesa fedora, master or james modifier branch (james' branch is hitting a different assert, so maybe that could behave differently for you as well) but my meson args are those: -Dplatforms=auto -Dllvm=false -Dgallium-drivers="nouveau, tegra" -Dbuildtype=debug But I am seeing a bunch of messages in dmesg in a release build as well: [ 233.080649] nouveau 57000000.gpu: gr: DATA_ERROR 00000003 [INVALID_OPERATION] ch 4 [0400323000 glxgears[412]] subc 0 class b197 mthd 19d0 data 0000003d [ 233.094237] nouveau 57000000.gpu: gr: DATA_ERROR 0000009c [] ch 4 [0400323000 glxgears[412]] subc 0 class b197 mthd 0d78 data 00000052 [ 233.106327] nouveau 57000000.gpu: gr: DATA_ERROR 0000009c [] ch 4 [0400323000 glxgears[412]] subc 0 class b197 mthd 0d78 data 00000050 But at the moment I kind of expect Xorg to be the difference. I will try with Xorg from git and see if it goes away. > Thierry
On Wed, Aug 12, 2020 at 5:20 PM Karol Herbst <kherbst@redhat.com> wrote: > > On Wed, Aug 12, 2020 at 5:05 PM Thierry Reding <thierry.reding@gmail.com> wrote: > > > > On Wed, Aug 12, 2020 at 12:43:17PM +0200, Karol Herbst wrote: > > > On Wed, Aug 12, 2020 at 12:27 PM Karol Herbst <kherbst@redhat.com> wrote: > > > > > > > > On Wed, Aug 12, 2020 at 2:19 AM James Jones <jajones@nvidia.com> wrote: > > > > > > > > > > Sorry for the slow reply here as well. I've been in the process of > > > > > rebasing and reworking the userspace patches. I'm not clear my changes > > > > > will address the Jetson Nano issue, but if you'd like to try them, the > > > > > latest userspace changes are available here: > > > > > > > > > > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3724 > > > > > > > > > > And the tegra-drm kernel patches are here: > > > > > > > > > > > > > > > https://patchwork.ozlabs.org/project/linux-tegra/patch/20191217005205.2573-1-jajones@nvidia.com/ > > > > > > > > > > Those + the kernel changes addressed in this thread are everything I had > > > > > outstanding. > > > > > > > > > > > > > I don't know if that's caused by your changes or not, but now the > > > > assert I hit is a different one pointing out that > > > > nvc0_miptree_select_best_modifier fails in a certain case and returns > > > > MOD_INVALID... anyway, it seems like with your patches applied it's > > > > now way easier to debug and figure out what's going wrong, so maybe I > > > > can figure it out now :) > > > > > > > > > > collected some information which might help to track it down. > > > > > > src/gallium/frontends/dri/dri2.c:648 is the assert hit: assert(*zsbuf) > > > > > > templ is {reference = {count = 0}, width0 = 300, height0 = 300, depth0 > > > = 1, array_size = 1, format = PIPE_FORMAT_Z24X8_UNORM, target = > > > PIPE_TEXTURE_2D, last_level = 0, nr_samples = 0, nr_storage_samples = > > > 0, usage = 0, bind = 1, flags = 0, next = 0x0, screen = 0x0} > > > > > > inside tegra_screen_resource_create modifier says > > > DRM_FORMAT_MOD_INVALID as template->bind is 1 > > > > > > and nvc0_miptree_select_best_modifier returns DRM_FORMAT_MOD_INVALID, > > > so the call just returns NULL leading to the assert. > > > > > > Btw, this is on Xorg-1.20.8-1.fc32.aarch64 with glxgears. > > > > Hi Karol, > > > > I'm not sure if I'm doing something wrong, but I can't seem to reproduce > > these assertions at all. I've got a debug Mesa build from today as well > > as an X server build from today and if I run glxgears it works just fine > > for me. This is on Jetson TX1, but I don't think there's any difference > > to Jetson Nano in that regard. I'll try to give this a try on Jetson > > Nano as well, just to make sure. I can also try to pull in James' > > patches to see if they change anything for me. > > > > However, perhaps we can compare notes on what exactly your configuration > > is so that perhaps I can reproduce and take a closer look at what's > > going on. > > > > My Mesa build uses the following configuration: > > > > $ meson --prefix /usr --libexecdir /usr/lib --buildtype debug -Dgles1=false \ > > -Dgallium-drivers=nouveau,swrast,tegra -Dgallium-opencl=standalone \ > > -Dvulkan-drivers='' -Dplatforms=wayland,x11,drm,surfaceless \ > > -Dbuild-tests=true -Dtexture-float=true -Ddri-drivers='' \ > > -Dgallium-omx=disabled -Dllvm=true > > > > And here's what I use for X: > > > > $ meson --prefix /usr --libexecdir /usr/lib \ > > --libexecdir /usr/lib/xorg-server -Dxdmcp=false > > > > I've stripped out some cross-compilation boilerplate there because that > > shouldn't be relevant. Do you see anything in there that I'm missing and > > which might be causing the issue not to happen for me? > > > > Also, what's the window manager that you use? I use TWM (for simplicity) > > and I suspect that's not what you use, so perhaps this is relevant > > somehow as well? > > > > I don't use any at all, just plain X. > > Anyway, for software versions: > kernel-5.8.0 + patch James refered to above > Xorg-1.20.8-1.fc32.aarch64 (just the normal fc32 build) > > mesa fedora, master or james modifier branch (james' branch is hitting > a different assert, so maybe that could behave differently for you as > well) > but my meson args are those: -Dplatforms=auto -Dllvm=false > -Dgallium-drivers="nouveau, tegra" -Dbuildtype=debug > > But I am seeing a bunch of messages in dmesg in a release build as well: > [ 233.080649] nouveau 57000000.gpu: gr: DATA_ERROR 00000003 > [INVALID_OPERATION] ch 4 [0400323000 glxgears[412]] subc 0 class b197 > mthd 19d0 data 0000003d > [ 233.094237] nouveau 57000000.gpu: gr: DATA_ERROR 0000009c [] ch 4 > [0400323000 glxgears[412]] subc 0 class b197 mthd 0d78 data 00000052 > [ 233.106327] nouveau 57000000.gpu: gr: DATA_ERROR 0000009c [] ch 4 > [0400323000 glxgears[412]] subc 0 class b197 mthd 0d78 data 00000050 > > But at the moment I kind of expect Xorg to be the difference. I will > try with Xorg from git and see if it goes away. yeah... Xorg from git doesn't cause the errors/asserts. So.. what do we want to do? > > > Thierry
On 8/12/20 5:37 AM, Ilia Mirkin wrote: > On Wed, Aug 12, 2020 at 8:24 AM Karol Herbst <kherbst@redhat.com> wrote: >> >> On Wed, Aug 12, 2020 at 12:43 PM Karol Herbst <kherbst@redhat.com> wrote: >>> >>> On Wed, Aug 12, 2020 at 12:27 PM Karol Herbst <kherbst@redhat.com> wrote: >>>> >>>> On Wed, Aug 12, 2020 at 2:19 AM James Jones <jajones@nvidia.com> wrote: >>>>> >>>>> Sorry for the slow reply here as well. I've been in the process of >>>>> rebasing and reworking the userspace patches. I'm not clear my changes >>>>> will address the Jetson Nano issue, but if you'd like to try them, the >>>>> latest userspace changes are available here: >>>>> >>>>> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3724 >>>>> >>>>> And the tegra-drm kernel patches are here: >>>>> >>>>> >>>>> https://patchwork.ozlabs.org/project/linux-tegra/patch/20191217005205.2573-1-jajones@nvidia.com/ >>>>> >>>>> Those + the kernel changes addressed in this thread are everything I had >>>>> outstanding. >>>>> >>>> >>>> I don't know if that's caused by your changes or not, but now the >>>> assert I hit is a different one pointing out that >>>> nvc0_miptree_select_best_modifier fails in a certain case and returns >>>> MOD_INVALID... anyway, it seems like with your patches applied it's >>>> now way easier to debug and figure out what's going wrong, so maybe I >>>> can figure it out now :) >>>> >>> >>> collected some information which might help to track it down. >>> >>> src/gallium/frontends/dri/dri2.c:648 is the assert hit: assert(*zsbuf) >>> >>> templ is {reference = {count = 0}, width0 = 300, height0 = 300, depth0 >>> = 1, array_size = 1, format = PIPE_FORMAT_Z24X8_UNORM, target = >>> PIPE_TEXTURE_2D, last_level = 0, nr_samples = 0, nr_storage_samples = >>> 0, usage = 0, bind = 1, flags = 0, next = 0x0, screen = 0x0} >>> >>> inside tegra_screen_resource_create modifier says >>> DRM_FORMAT_MOD_INVALID as template->bind is 1 >>> >>> and nvc0_miptree_select_best_modifier returns DRM_FORMAT_MOD_INVALID, >>> so the call just returns NULL leading to the assert. >>> >>> Btw, this is on Xorg-1.20.8-1.fc32.aarch64 with glxgears. >>> >> >> So I digged a bit deeper and here is what tripps it of: >> >> when the context gets made current, the normal framebuffer validation >> and render buffer allocation is done, but we end up inside >> tegra_screen_resource_create at some point with PIPE_BIND_SCANOUT set >> in template->bind. Now the tegra driver forces the >> DRM_FORMAT_MOD_LINEAR modifier and calls into >> resource_create_with_modifiers. >> >> If it wouldn't do that, nouveau would allocate a tiled buffer, with >> that it's linear and we at some point end up with an assert about a >> depth_stencil buffer being there even though it shouldn't. If I always >> use DRM_FORMAT_MOD_INVALID in tegra_screen_resource_create, things >> just work. >> >> That's kind of the cause I pinpointed the issue down to. But I have no >> idea what's supposed to happen and what the actual bug is. > > Yeah, the bug with tegra has always been "trying to render to linear > color + tiled depth", which the hardware plain doesn't support. (And > linear depth isn't a thing.) > > Question is whether what it's doing necessary. PIPE_BIND_SCANOUT > (/linear) requirements are needed for DRI2 to work (well, maybe not in > theory, but at least in practice the nouveau ddx expects linear > buffers). However tegra operates on a more DRI3-like basis, so with > "client" allocations, tiled should work OK as long as there's > something in tegra to copy it to linear when necessary? I can confirm the above: Our hardware can't render to linear depth buffers, nor can it mix linear color buffers with block linear depth buffers. I think there's a misunderstanding on expected behavior of resource_create_with_modifiers() here too: tegra_screen_resource_create() is passing DRM_FORMAT_MOD_INVALID as the only modifier in non-scanout cases. Previously, I believe nouveau may have treated that as "no modifiers specified. Fall back to internal layout selection logic", but in my patches I "fixed" it to match other drivers' behavior, in that allocation will fail if that is the only modifier in the list, since it is equivalent to passing in a list containing only unsupported modifiers. To get fallback behavior, tegra_screen_resource_create() should pass in (NULL, 0) for (modifiers, count), or just call resource_create() on the underlying screen instead. Beyond that, I can only offer my thoughts based on analysis of the code referenced here so far: While I've learned from the origins of this thread applications/things external to Mesa in general shouldn't be querying format modifiers of buffers created without format modifiers, tegra is a Mesa internal component that already has some intimate knowledge of how the nouveau driver it sits on top of works. Nouveau will always be able to construct and return a valid format modifier for unorm single sampled color buffers (and hopefully, anything that can scan out going forward), both before and after my patches I believe, regardless of how they were allocated. After my patches, it should even work for things that can't scan out in theory. Hence, looking at this without knowledge of what motivated the original changes, it seems like tegra_screen_resource_create should just naively forward the resource_create() call, relying on nouveau to select a layout and provide a valid modifier when queried for import. As Karol notes, this works fine for at least this simple test case, and it's what nouveau itself would be doing with an equivalent callstack, excepting the modifier query, so I find it hard to believe it breaks some application behavior. It'll also end up being equivalent (in end result, not quite semantically) to what dri3_alloc_render_buffer() was doing prior to the patch mentioned that broke things for Karol, so certainly for the DRI3 usage it's the right behavior. Ilia, what in the nouveau DDX (As in Xfree86 DDX?) assumes linear buffers? It sounds like you don't think it will interact poorly with this path regardless? Thierry, do you recall what motivated the force-linear code here? As to why this works for Thierry and not Karol, that's confusing. Are you both using the same X11 DDX (modesetting I assume?) and X server versions? Could it be a difference in client-side DRI library code somehow? Thanks, -James > -ilia >
On Wed, Aug 12, 2020 at 7:03 PM James Jones <jajones@nvidia.com> wrote: > > On 8/12/20 5:37 AM, Ilia Mirkin wrote: > > On Wed, Aug 12, 2020 at 8:24 AM Karol Herbst <kherbst@redhat.com> wrote: > >> > >> On Wed, Aug 12, 2020 at 12:43 PM Karol Herbst <kherbst@redhat.com> wrote: > >>> > >>> On Wed, Aug 12, 2020 at 12:27 PM Karol Herbst <kherbst@redhat.com> wrote: > >>>> > >>>> On Wed, Aug 12, 2020 at 2:19 AM James Jones <jajones@nvidia.com> wrote: > >>>>> > >>>>> Sorry for the slow reply here as well. I've been in the process of > >>>>> rebasing and reworking the userspace patches. I'm not clear my changes > >>>>> will address the Jetson Nano issue, but if you'd like to try them, the > >>>>> latest userspace changes are available here: > >>>>> > >>>>> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3724 > >>>>> > >>>>> And the tegra-drm kernel patches are here: > >>>>> > >>>>> > >>>>> https://patchwork.ozlabs.org/project/linux-tegra/patch/20191217005205.2573-1-jajones@nvidia.com/ > >>>>> > >>>>> Those + the kernel changes addressed in this thread are everything I had > >>>>> outstanding. > >>>>> > >>>> > >>>> I don't know if that's caused by your changes or not, but now the > >>>> assert I hit is a different one pointing out that > >>>> nvc0_miptree_select_best_modifier fails in a certain case and returns > >>>> MOD_INVALID... anyway, it seems like with your patches applied it's > >>>> now way easier to debug and figure out what's going wrong, so maybe I > >>>> can figure it out now :) > >>>> > >>> > >>> collected some information which might help to track it down. > >>> > >>> src/gallium/frontends/dri/dri2.c:648 is the assert hit: assert(*zsbuf) > >>> > >>> templ is {reference = {count = 0}, width0 = 300, height0 = 300, depth0 > >>> = 1, array_size = 1, format = PIPE_FORMAT_Z24X8_UNORM, target = > >>> PIPE_TEXTURE_2D, last_level = 0, nr_samples = 0, nr_storage_samples = > >>> 0, usage = 0, bind = 1, flags = 0, next = 0x0, screen = 0x0} > >>> > >>> inside tegra_screen_resource_create modifier says > >>> DRM_FORMAT_MOD_INVALID as template->bind is 1 > >>> > >>> and nvc0_miptree_select_best_modifier returns DRM_FORMAT_MOD_INVALID, > >>> so the call just returns NULL leading to the assert. > >>> > >>> Btw, this is on Xorg-1.20.8-1.fc32.aarch64 with glxgears. > >>> > >> > >> So I digged a bit deeper and here is what tripps it of: > >> > >> when the context gets made current, the normal framebuffer validation > >> and render buffer allocation is done, but we end up inside > >> tegra_screen_resource_create at some point with PIPE_BIND_SCANOUT set > >> in template->bind. Now the tegra driver forces the > >> DRM_FORMAT_MOD_LINEAR modifier and calls into > >> resource_create_with_modifiers. > >> > >> If it wouldn't do that, nouveau would allocate a tiled buffer, with > >> that it's linear and we at some point end up with an assert about a > >> depth_stencil buffer being there even though it shouldn't. If I always > >> use DRM_FORMAT_MOD_INVALID in tegra_screen_resource_create, things > >> just work. > >> > >> That's kind of the cause I pinpointed the issue down to. But I have no > >> idea what's supposed to happen and what the actual bug is. > > > > Yeah, the bug with tegra has always been "trying to render to linear > > color + tiled depth", which the hardware plain doesn't support. (And > > linear depth isn't a thing.) > > > > Question is whether what it's doing necessary. PIPE_BIND_SCANOUT > > (/linear) requirements are needed for DRI2 to work (well, maybe not in > > theory, but at least in practice the nouveau ddx expects linear > > buffers). However tegra operates on a more DRI3-like basis, so with > > "client" allocations, tiled should work OK as long as there's > > something in tegra to copy it to linear when necessary? > > I can confirm the above: Our hardware can't render to linear depth > buffers, nor can it mix linear color buffers with block linear depth > buffers. > > I think there's a misunderstanding on expected behavior of > resource_create_with_modifiers() here too: > tegra_screen_resource_create() is passing DRM_FORMAT_MOD_INVALID as the > only modifier in non-scanout cases. Previously, I believe nouveau may > have treated that as "no modifiers specified. Fall back to internal > layout selection logic", but in my patches I "fixed" it to match other > drivers' behavior, in that allocation will fail if that is the only > modifier in the list, since it is equivalent to passing in a list > containing only unsupported modifiers. To get fallback behavior, > tegra_screen_resource_create() should pass in (NULL, 0) for (modifiers, > count), or just call resource_create() on the underlying screen instead. > > Beyond that, I can only offer my thoughts based on analysis of the code > referenced here so far: > > While I've learned from the origins of this thread applications/things > external to Mesa in general shouldn't be querying format modifiers of > buffers created without format modifiers, tegra is a Mesa internal > component that already has some intimate knowledge of how the nouveau > driver it sits on top of works. Nouveau will always be able to > construct and return a valid format modifier for unorm single sampled > color buffers (and hopefully, anything that can scan out going forward), > both before and after my patches I believe, regardless of how they were > allocated. After my patches, it should even work for things that can't > scan out in theory. Hence, looking at this without knowledge of what > motivated the original changes, it seems like > tegra_screen_resource_create should just naively forward the > resource_create() call, relying on nouveau to select a layout and > provide a valid modifier when queried for import. As Karol notes, this > works fine for at least this simple test case, and it's what nouveau > itself would be doing with an equivalent callstack, excepting the > modifier query, so I find it hard to believe it breaks some application > behavior. It'll also end up being equivalent (in end result, not quite > semantically) to what dri3_alloc_render_buffer() was doing prior to the > patch mentioned that broke things for Karol, so certainly for the DRI3 > usage it's the right behavior. > > Ilia, what in the nouveau DDX (As in Xfree86 DDX?) assumes linear > buffers? It sounds like you don't think it will interact poorly with > this path regardless? Thierry, do you recall what motivated the > force-linear code here? > > As to why this works for Thierry and not Karol, that's confusing. Are > you both using the same X11 DDX (modesetting I assume?) and X server > versions? Could it be a difference in client-side DRI library code somehow? > it's X. 1.20.99.1 works, 1.20.8 is broken. > Thanks, > -James > > > -ilia > > >
On 8/12/20 10:10 AM, Karol Herbst wrote: > On Wed, Aug 12, 2020 at 7:03 PM James Jones <jajones@nvidia.com> wrote: >> >> On 8/12/20 5:37 AM, Ilia Mirkin wrote: >>> On Wed, Aug 12, 2020 at 8:24 AM Karol Herbst <kherbst@redhat.com> wrote: >>>> >>>> On Wed, Aug 12, 2020 at 12:43 PM Karol Herbst <kherbst@redhat.com> wrote: >>>>> >>>>> On Wed, Aug 12, 2020 at 12:27 PM Karol Herbst <kherbst@redhat.com> wrote: >>>>>> >>>>>> On Wed, Aug 12, 2020 at 2:19 AM James Jones <jajones@nvidia.com> wrote: >>>>>>> >>>>>>> Sorry for the slow reply here as well. I've been in the process of >>>>>>> rebasing and reworking the userspace patches. I'm not clear my changes >>>>>>> will address the Jetson Nano issue, but if you'd like to try them, the >>>>>>> latest userspace changes are available here: >>>>>>> >>>>>>> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3724 >>>>>>> >>>>>>> And the tegra-drm kernel patches are here: >>>>>>> >>>>>>> >>>>>>> https://patchwork.ozlabs.org/project/linux-tegra/patch/20191217005205.2573-1-jajones@nvidia.com/ >>>>>>> >>>>>>> Those + the kernel changes addressed in this thread are everything I had >>>>>>> outstanding. >>>>>>> >>>>>> >>>>>> I don't know if that's caused by your changes or not, but now the >>>>>> assert I hit is a different one pointing out that >>>>>> nvc0_miptree_select_best_modifier fails in a certain case and returns >>>>>> MOD_INVALID... anyway, it seems like with your patches applied it's >>>>>> now way easier to debug and figure out what's going wrong, so maybe I >>>>>> can figure it out now :) >>>>>> >>>>> >>>>> collected some information which might help to track it down. >>>>> >>>>> src/gallium/frontends/dri/dri2.c:648 is the assert hit: assert(*zsbuf) >>>>> >>>>> templ is {reference = {count = 0}, width0 = 300, height0 = 300, depth0 >>>>> = 1, array_size = 1, format = PIPE_FORMAT_Z24X8_UNORM, target = >>>>> PIPE_TEXTURE_2D, last_level = 0, nr_samples = 0, nr_storage_samples = >>>>> 0, usage = 0, bind = 1, flags = 0, next = 0x0, screen = 0x0} >>>>> >>>>> inside tegra_screen_resource_create modifier says >>>>> DRM_FORMAT_MOD_INVALID as template->bind is 1 >>>>> >>>>> and nvc0_miptree_select_best_modifier returns DRM_FORMAT_MOD_INVALID, >>>>> so the call just returns NULL leading to the assert. >>>>> >>>>> Btw, this is on Xorg-1.20.8-1.fc32.aarch64 with glxgears. >>>>> >>>> >>>> So I digged a bit deeper and here is what tripps it of: >>>> >>>> when the context gets made current, the normal framebuffer validation >>>> and render buffer allocation is done, but we end up inside >>>> tegra_screen_resource_create at some point with PIPE_BIND_SCANOUT set >>>> in template->bind. Now the tegra driver forces the >>>> DRM_FORMAT_MOD_LINEAR modifier and calls into >>>> resource_create_with_modifiers. >>>> >>>> If it wouldn't do that, nouveau would allocate a tiled buffer, with >>>> that it's linear and we at some point end up with an assert about a >>>> depth_stencil buffer being there even though it shouldn't. If I always >>>> use DRM_FORMAT_MOD_INVALID in tegra_screen_resource_create, things >>>> just work. >>>> >>>> That's kind of the cause I pinpointed the issue down to. But I have no >>>> idea what's supposed to happen and what the actual bug is. >>> >>> Yeah, the bug with tegra has always been "trying to render to linear >>> color + tiled depth", which the hardware plain doesn't support. (And >>> linear depth isn't a thing.) >>> >>> Question is whether what it's doing necessary. PIPE_BIND_SCANOUT >>> (/linear) requirements are needed for DRI2 to work (well, maybe not in >>> theory, but at least in practice the nouveau ddx expects linear >>> buffers). However tegra operates on a more DRI3-like basis, so with >>> "client" allocations, tiled should work OK as long as there's >>> something in tegra to copy it to linear when necessary? >> >> I can confirm the above: Our hardware can't render to linear depth >> buffers, nor can it mix linear color buffers with block linear depth >> buffers. >> >> I think there's a misunderstanding on expected behavior of >> resource_create_with_modifiers() here too: >> tegra_screen_resource_create() is passing DRM_FORMAT_MOD_INVALID as the >> only modifier in non-scanout cases. Previously, I believe nouveau may >> have treated that as "no modifiers specified. Fall back to internal >> layout selection logic", but in my patches I "fixed" it to match other >> drivers' behavior, in that allocation will fail if that is the only >> modifier in the list, since it is equivalent to passing in a list >> containing only unsupported modifiers. To get fallback behavior, >> tegra_screen_resource_create() should pass in (NULL, 0) for (modifiers, >> count), or just call resource_create() on the underlying screen instead. ...and in merging my code with Alyssa's new panfrost format modifier support, I see panfrost does the opposite of this and treats a format modifier list of only INVALID as "don't care". I modeled the new nouveau behavior on the Iris driver. Now I'm not sure which is correct :-( Thanks, -James >> Beyond that, I can only offer my thoughts based on analysis of the code >> referenced here so far: >> >> While I've learned from the origins of this thread applications/things >> external to Mesa in general shouldn't be querying format modifiers of >> buffers created without format modifiers, tegra is a Mesa internal >> component that already has some intimate knowledge of how the nouveau >> driver it sits on top of works. Nouveau will always be able to >> construct and return a valid format modifier for unorm single sampled >> color buffers (and hopefully, anything that can scan out going forward), >> both before and after my patches I believe, regardless of how they were >> allocated. After my patches, it should even work for things that can't >> scan out in theory. Hence, looking at this without knowledge of what >> motivated the original changes, it seems like >> tegra_screen_resource_create should just naively forward the >> resource_create() call, relying on nouveau to select a layout and >> provide a valid modifier when queried for import. As Karol notes, this >> works fine for at least this simple test case, and it's what nouveau >> itself would be doing with an equivalent callstack, excepting the >> modifier query, so I find it hard to believe it breaks some application >> behavior. It'll also end up being equivalent (in end result, not quite >> semantically) to what dri3_alloc_render_buffer() was doing prior to the >> patch mentioned that broke things for Karol, so certainly for the DRI3 >> usage it's the right behavior. >> >> Ilia, what in the nouveau DDX (As in Xfree86 DDX?) assumes linear >> buffers? It sounds like you don't think it will interact poorly with >> this path regardless? Thierry, do you recall what motivated the >> force-linear code here? >> >> As to why this works for Thierry and not Karol, that's confusing. Are >> you both using the same X11 DDX (modesetting I assume?) and X server >> versions? Could it be a difference in client-side DRI library code somehow? >> > > it's X. 1.20.99.1 works, 1.20.8 is broken. > >> Thanks, >> -James >> >>> -ilia >>> >> >
> ...and in merging my code with Alyssa's new panfrost format modifier > support, I see panfrost does the opposite of this and treats a format > modifier list of only INVALID as "don't care". I modeled the new nouveau > behavior on the Iris driver. Now I'm not sure which is correct :-( ....and neither am I. Uh-oh. I modeled the panfrost code after v3d_resource_create_with_modifiers, which treats INVALID as "don't care". I can confirm the panfrost code works (in the sense that it's functional on the machines I've tested), but I don't know if it is actually correct. I think it is, since otherwise you end up using linear in places it's unnecessary, but I'm not sure where this is spec'd.
On 8/12/20 10:40 AM, Alyssa Rosenzweig wrote: >> ...and in merging my code with Alyssa's new panfrost format modifier >> support, I see panfrost does the opposite of this and treats a format >> modifier list of only INVALID as "don't care". I modeled the new nouveau >> behavior on the Iris driver. Now I'm not sure which is correct :-( > > ....and neither am I. Uh-oh. > > I modeled the panfrost code after v3d_resource_create_with_modifiers, > which treats INVALID as "don't care". I can confirm the panfrost code > works (in the sense that it's functional on the machines I've tested), > but I don't know if it is actually correct. I think it is, since > otherwise you end up using linear in places it's unnecessary, but I'm > not sure where this is spec'd. It would depend on whether an app actually calls the function this way, and whether that app was tested I suppose. If I'm interpreting the Iris code correctly and it doesn't break anything, then I'm assuming both implementations are equally valid in that nothing exercises this path, but it would be good to have the intended behavior documented somewhere so we can try to work towards consistent in case someone tries it in the future. My nouveau change runs afoul of assumptions in the tegra driver, but that's easy enough to fix in lockstep if desired. Also, heads up: I'll ping you on my format modifier cleanup MR once I've pushed the latest version. The panfrost modifier usage was harder to merge into the refactoring than most, so it'll be good to have your review and if you have time, some testing. I think I landed on an elegant solution, but open to suggestions. Thanks, -James
in case you all were wondering, it works on xorg-server git because of this commit: https://gitlab.freedesktop.org/xorg/xserver/-/commit/9b8999411033c9473cd68e92e4690a91aecf5b95 On Wed, Aug 12, 2020 at 8:25 PM James Jones <jajones@nvidia.com> wrote: > > On 8/12/20 10:40 AM, Alyssa Rosenzweig wrote: > >> ...and in merging my code with Alyssa's new panfrost format modifier > >> support, I see panfrost does the opposite of this and treats a format > >> modifier list of only INVALID as "don't care". I modeled the new nouveau > >> behavior on the Iris driver. Now I'm not sure which is correct :-( > > > > ....and neither am I. Uh-oh. > > > > I modeled the panfrost code after v3d_resource_create_with_modifiers, > > which treats INVALID as "don't care". I can confirm the panfrost code > > works (in the sense that it's functional on the machines I've tested), > > but I don't know if it is actually correct. I think it is, since > > otherwise you end up using linear in places it's unnecessary, but I'm > > not sure where this is spec'd. > > It would depend on whether an app actually calls the function this way, > and whether that app was tested I suppose. If I'm interpreting the Iris > code correctly and it doesn't break anything, then I'm assuming both > implementations are equally valid in that nothing exercises this path, > but it would be good to have the intended behavior documented somewhere > so we can try to work towards consistent in case someone tries it in the > future. > > My nouveau change runs afoul of assumptions in the tegra driver, but > that's easy enough to fix in lockstep if desired. > > Also, heads up: I'll ping you on my format modifier cleanup MR once I've > pushed the latest version. The panfrost modifier usage was harder to > merge into the refactoring than most, so it'll be good to have your > review and if you have time, some testing. I think I landed on an > elegant solution, but open to suggestions. > > Thanks, > -James >
At least for now I've created an MR to revert the change: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6300 But it seems like there was probably a good reason why it got added? Happy to have better fixes, but that's the best we've got so far I think? Thierry, what do you think? On Wed, Aug 12, 2020 at 8:51 PM Karol Herbst <kherbst@redhat.com> wrote: > > in case you all were wondering, it works on xorg-server git because of > this commit: https://gitlab.freedesktop.org/xorg/xserver/-/commit/9b8999411033c9473cd68e92e4690a91aecf5b95 > > On Wed, Aug 12, 2020 at 8:25 PM James Jones <jajones@nvidia.com> wrote: > > > > On 8/12/20 10:40 AM, Alyssa Rosenzweig wrote: > > >> ...and in merging my code with Alyssa's new panfrost format modifier > > >> support, I see panfrost does the opposite of this and treats a format > > >> modifier list of only INVALID as "don't care". I modeled the new nouveau > > >> behavior on the Iris driver. Now I'm not sure which is correct :-( > > > > > > ....and neither am I. Uh-oh. > > > > > > I modeled the panfrost code after v3d_resource_create_with_modifiers, > > > which treats INVALID as "don't care". I can confirm the panfrost code > > > works (in the sense that it's functional on the machines I've tested), > > > but I don't know if it is actually correct. I think it is, since > > > otherwise you end up using linear in places it's unnecessary, but I'm > > > not sure where this is spec'd. > > > > It would depend on whether an app actually calls the function this way, > > and whether that app was tested I suppose. If I'm interpreting the Iris > > code correctly and it doesn't break anything, then I'm assuming both > > implementations are equally valid in that nothing exercises this path, > > but it would be good to have the intended behavior documented somewhere > > so we can try to work towards consistent in case someone tries it in the > > future. > > > > My nouveau change runs afoul of assumptions in the tegra driver, but > > that's easy enough to fix in lockstep if desired. > > > > Also, heads up: I'll ping you on my format modifier cleanup MR once I've > > pushed the latest version. The panfrost modifier usage was harder to > > merge into the refactoring than most, so it'll be good to have your > > review and if you have time, some testing. I think I landed on an > > elegant solution, but open to suggestions. > > > > Thanks, > > -James > >
btw, I just noticed that wayland with gnome-shell is totally busted. With this MR it at least displays something, but without it doesn't work at all. On Thu, Aug 13, 2020 at 3:00 PM Karol Herbst <kherbst@redhat.com> wrote: > > At least for now I've created an MR to revert the change: > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6300 > > But it seems like there was probably a good reason why it got added? > Happy to have better fixes, but that's the best we've got so far I > think? > > Thierry, what do you think? > > On Wed, Aug 12, 2020 at 8:51 PM Karol Herbst <kherbst@redhat.com> wrote: > > > > in case you all were wondering, it works on xorg-server git because of > > this commit: https://gitlab.freedesktop.org/xorg/xserver/-/commit/9b8999411033c9473cd68e92e4690a91aecf5b95 > > > > On Wed, Aug 12, 2020 at 8:25 PM James Jones <jajones@nvidia.com> wrote: > > > > > > On 8/12/20 10:40 AM, Alyssa Rosenzweig wrote: > > > >> ...and in merging my code with Alyssa's new panfrost format modifier > > > >> support, I see panfrost does the opposite of this and treats a format > > > >> modifier list of only INVALID as "don't care". I modeled the new nouveau > > > >> behavior on the Iris driver. Now I'm not sure which is correct :-( > > > > > > > > ....and neither am I. Uh-oh. > > > > > > > > I modeled the panfrost code after v3d_resource_create_with_modifiers, > > > > which treats INVALID as "don't care". I can confirm the panfrost code > > > > works (in the sense that it's functional on the machines I've tested), > > > > but I don't know if it is actually correct. I think it is, since > > > > otherwise you end up using linear in places it's unnecessary, but I'm > > > > not sure where this is spec'd. > > > > > > It would depend on whether an app actually calls the function this way, > > > and whether that app was tested I suppose. If I'm interpreting the Iris > > > code correctly and it doesn't break anything, then I'm assuming both > > > implementations are equally valid in that nothing exercises this path, > > > but it would be good to have the intended behavior documented somewhere > > > so we can try to work towards consistent in case someone tries it in the > > > future. > > > > > > My nouveau change runs afoul of assumptions in the tegra driver, but > > > that's easy enough to fix in lockstep if desired. > > > > > > Also, heads up: I'll ping you on my format modifier cleanup MR once I've > > > pushed the latest version. The panfrost modifier usage was harder to > > > merge into the refactoring than most, so it'll be good to have your > > > review and if you have time, some testing. I think I landed on an > > > elegant solution, but open to suggestions. > > > > > > Thanks, > > > -James > > >
another thing: with gsettings set org.gnome.mutter experimental-features '["kms-modifiers"]' it all just works out of the box with wayland, but that won't be enabled for quite some time, so we need to figure out what is broken (less so with my patch) under wayland with gnome :) On Thu, Aug 13, 2020 at 5:39 PM Karol Herbst <kherbst@redhat.com> wrote: > > btw, I just noticed that wayland with gnome-shell is totally busted. > With this MR it at least displays something, but without it doesn't > work at all. > > On Thu, Aug 13, 2020 at 3:00 PM Karol Herbst <kherbst@redhat.com> wrote: > > > > At least for now I've created an MR to revert the change: > > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6300 > > > > But it seems like there was probably a good reason why it got added? > > Happy to have better fixes, but that's the best we've got so far I > > think? > > > > Thierry, what do you think? > > > > On Wed, Aug 12, 2020 at 8:51 PM Karol Herbst <kherbst@redhat.com> wrote: > > > > > > in case you all were wondering, it works on xorg-server git because of > > > this commit: https://gitlab.freedesktop.org/xorg/xserver/-/commit/9b8999411033c9473cd68e92e4690a91aecf5b95 > > > > > > On Wed, Aug 12, 2020 at 8:25 PM James Jones <jajones@nvidia.com> wrote: > > > > > > > > On 8/12/20 10:40 AM, Alyssa Rosenzweig wrote: > > > > >> ...and in merging my code with Alyssa's new panfrost format modifier > > > > >> support, I see panfrost does the opposite of this and treats a format > > > > >> modifier list of only INVALID as "don't care". I modeled the new nouveau > > > > >> behavior on the Iris driver. Now I'm not sure which is correct :-( > > > > > > > > > > ....and neither am I. Uh-oh. > > > > > > > > > > I modeled the panfrost code after v3d_resource_create_with_modifiers, > > > > > which treats INVALID as "don't care". I can confirm the panfrost code > > > > > works (in the sense that it's functional on the machines I've tested), > > > > > but I don't know if it is actually correct. I think it is, since > > > > > otherwise you end up using linear in places it's unnecessary, but I'm > > > > > not sure where this is spec'd. > > > > > > > > It would depend on whether an app actually calls the function this way, > > > > and whether that app was tested I suppose. If I'm interpreting the Iris > > > > code correctly and it doesn't break anything, then I'm assuming both > > > > implementations are equally valid in that nothing exercises this path, > > > > but it would be good to have the intended behavior documented somewhere > > > > so we can try to work towards consistent in case someone tries it in the > > > > future. > > > > > > > > My nouveau change runs afoul of assumptions in the tegra driver, but > > > > that's easy enough to fix in lockstep if desired. > > > > > > > > Also, heads up: I'll ping you on my format modifier cleanup MR once I've > > > > pushed the latest version. The panfrost modifier usage was harder to > > > > merge into the refactoring than most, so it'll be good to have your > > > > review and if you have time, some testing. I think I landed on an > > > > elegant solution, but open to suggestions. > > > > > > > > Thanks, > > > > -James > > > >
I'll defer to Thierry, but I think that may be by design. Tegra format modifiers were added to get things like this working in the first place, right? It's not a regression, is it? Thanks, -James On 8/13/20 10:19 AM, Karol Herbst wrote: > another thing: with gsettings set org.gnome.mutter > experimental-features '["kms-modifiers"]' it all just works out of the > box with wayland, but that won't be enabled for quite some time, so we > need to figure out what is broken (less so with my patch) under > wayland with gnome :) > > On Thu, Aug 13, 2020 at 5:39 PM Karol Herbst <kherbst@redhat.com> wrote: >> >> btw, I just noticed that wayland with gnome-shell is totally busted. >> With this MR it at least displays something, but without it doesn't >> work at all. >> >> On Thu, Aug 13, 2020 at 3:00 PM Karol Herbst <kherbst@redhat.com> wrote: >>> >>> At least for now I've created an MR to revert the change: >>> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6300 >>> >>> But it seems like there was probably a good reason why it got added? >>> Happy to have better fixes, but that's the best we've got so far I >>> think? >>> >>> Thierry, what do you think? >>> >>> On Wed, Aug 12, 2020 at 8:51 PM Karol Herbst <kherbst@redhat.com> wrote: >>>> >>>> in case you all were wondering, it works on xorg-server git because of >>>> this commit: https://gitlab.freedesktop.org/xorg/xserver/-/commit/9b8999411033c9473cd68e92e4690a91aecf5b95 >>>> >>>> On Wed, Aug 12, 2020 at 8:25 PM James Jones <jajones@nvidia.com> wrote: >>>>> >>>>> On 8/12/20 10:40 AM, Alyssa Rosenzweig wrote: >>>>>>> ...and in merging my code with Alyssa's new panfrost format modifier >>>>>>> support, I see panfrost does the opposite of this and treats a format >>>>>>> modifier list of only INVALID as "don't care". I modeled the new nouveau >>>>>>> behavior on the Iris driver. Now I'm not sure which is correct :-( >>>>>> >>>>>> ....and neither am I. Uh-oh. >>>>>> >>>>>> I modeled the panfrost code after v3d_resource_create_with_modifiers, >>>>>> which treats INVALID as "don't care". I can confirm the panfrost code >>>>>> works (in the sense that it's functional on the machines I've tested), >>>>>> but I don't know if it is actually correct. I think it is, since >>>>>> otherwise you end up using linear in places it's unnecessary, but I'm >>>>>> not sure where this is spec'd. >>>>> >>>>> It would depend on whether an app actually calls the function this way, >>>>> and whether that app was tested I suppose. If I'm interpreting the Iris >>>>> code correctly and it doesn't break anything, then I'm assuming both >>>>> implementations are equally valid in that nothing exercises this path, >>>>> but it would be good to have the intended behavior documented somewhere >>>>> so we can try to work towards consistent in case someone tries it in the >>>>> future. >>>>> >>>>> My nouveau change runs afoul of assumptions in the tegra driver, but >>>>> that's easy enough to fix in lockstep if desired. >>>>> >>>>> Also, heads up: I'll ping you on my format modifier cleanup MR once I've >>>>> pushed the latest version. The panfrost modifier usage was harder to >>>>> merge into the refactoring than most, so it'll be good to have your >>>>> review and if you have time, some testing. I think I landed on an >>>>> elegant solution, but open to suggestions. >>>>> >>>>> Thanks, >>>>> -James >>>>> > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel >
On Thu, Aug 13, 2020 at 7:45 PM James Jones <jajones@nvidia.com> wrote: > > I'll defer to Thierry, but I think that may be by design. Tegra format > modifiers were added to get things like this working in the first place, > right? It's not a regression, is it? > That would be slightly annoying as this would mean by design it's broken by default :/ Also, we have no Xorg release supporting modifiers anyway and it does seem to work with X 1.20.8 (which doesn't enable modifier support). And I talked with Jonas (working on mutter) about it and there were no plans to turn on modifier support by default at this point. > Thanks, > -James > > On 8/13/20 10:19 AM, Karol Herbst wrote: > > another thing: with gsettings set org.gnome.mutter > > experimental-features '["kms-modifiers"]' it all just works out of the > > box with wayland, but that won't be enabled for quite some time, so we > > need to figure out what is broken (less so with my patch) under > > wayland with gnome :) > > > > On Thu, Aug 13, 2020 at 5:39 PM Karol Herbst <kherbst@redhat.com> wrote: > >> > >> btw, I just noticed that wayland with gnome-shell is totally busted. > >> With this MR it at least displays something, but without it doesn't > >> work at all. > >> > >> On Thu, Aug 13, 2020 at 3:00 PM Karol Herbst <kherbst@redhat.com> wrote: > >>> > >>> At least for now I've created an MR to revert the change: > >>> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6300 > >>> > >>> But it seems like there was probably a good reason why it got added? > >>> Happy to have better fixes, but that's the best we've got so far I > >>> think? > >>> > >>> Thierry, what do you think? > >>> > >>> On Wed, Aug 12, 2020 at 8:51 PM Karol Herbst <kherbst@redhat.com> wrote: > >>>> > >>>> in case you all were wondering, it works on xorg-server git because of > >>>> this commit: https://gitlab.freedesktop.org/xorg/xserver/-/commit/9b8999411033c9473cd68e92e4690a91aecf5b95 > >>>> > >>>> On Wed, Aug 12, 2020 at 8:25 PM James Jones <jajones@nvidia.com> wrote: > >>>>> > >>>>> On 8/12/20 10:40 AM, Alyssa Rosenzweig wrote: > >>>>>>> ...and in merging my code with Alyssa's new panfrost format modifier > >>>>>>> support, I see panfrost does the opposite of this and treats a format > >>>>>>> modifier list of only INVALID as "don't care". I modeled the new nouveau > >>>>>>> behavior on the Iris driver. Now I'm not sure which is correct :-( > >>>>>> > >>>>>> ....and neither am I. Uh-oh. > >>>>>> > >>>>>> I modeled the panfrost code after v3d_resource_create_with_modifiers, > >>>>>> which treats INVALID as "don't care". I can confirm the panfrost code > >>>>>> works (in the sense that it's functional on the machines I've tested), > >>>>>> but I don't know if it is actually correct. I think it is, since > >>>>>> otherwise you end up using linear in places it's unnecessary, but I'm > >>>>>> not sure where this is spec'd. > >>>>> > >>>>> It would depend on whether an app actually calls the function this way, > >>>>> and whether that app was tested I suppose. If I'm interpreting the Iris > >>>>> code correctly and it doesn't break anything, then I'm assuming both > >>>>> implementations are equally valid in that nothing exercises this path, > >>>>> but it would be good to have the intended behavior documented somewhere > >>>>> so we can try to work towards consistent in case someone tries it in the > >>>>> future. > >>>>> > >>>>> My nouveau change runs afoul of assumptions in the tegra driver, but > >>>>> that's easy enough to fix in lockstep if desired. > >>>>> > >>>>> Also, heads up: I'll ping you on my format modifier cleanup MR once I've > >>>>> pushed the latest version. The panfrost modifier usage was harder to > >>>>> merge into the refactoring than most, so it'll be good to have your > >>>>> review and if you have time, some testing. I think I landed on an > >>>>> elegant solution, but open to suggestions. > >>>>> > >>>>> Thanks, > >>>>> -James > >>>>> > > > > _______________________________________________ > > dri-devel mailing list > > dri-devel@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > > >
On Wed, Aug 12, 2020 at 10:03:46AM -0700, James Jones wrote: > On 8/12/20 5:37 AM, Ilia Mirkin wrote: > > On Wed, Aug 12, 2020 at 8:24 AM Karol Herbst <kherbst@redhat.com> wrote: > > > > > > On Wed, Aug 12, 2020 at 12:43 PM Karol Herbst <kherbst@redhat.com> wrote: > > > > > > > > On Wed, Aug 12, 2020 at 12:27 PM Karol Herbst <kherbst@redhat.com> wrote: > > > > > > > > > > On Wed, Aug 12, 2020 at 2:19 AM James Jones <jajones@nvidia.com> wrote: > > > > > > > > > > > > Sorry for the slow reply here as well. I've been in the process of > > > > > > rebasing and reworking the userspace patches. I'm not clear my changes > > > > > > will address the Jetson Nano issue, but if you'd like to try them, the > > > > > > latest userspace changes are available here: > > > > > > > > > > > > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3724 > > > > > > > > > > > > And the tegra-drm kernel patches are here: > > > > > > > > > > > > > > > > > > https://patchwork.ozlabs.org/project/linux-tegra/patch/20191217005205.2573-1-jajones@nvidia.com/ > > > > > > > > > > > > Those + the kernel changes addressed in this thread are everything I had > > > > > > outstanding. > > > > > > > > > > > > > > > > I don't know if that's caused by your changes or not, but now the > > > > > assert I hit is a different one pointing out that > > > > > nvc0_miptree_select_best_modifier fails in a certain case and returns > > > > > MOD_INVALID... anyway, it seems like with your patches applied it's > > > > > now way easier to debug and figure out what's going wrong, so maybe I > > > > > can figure it out now :) > > > > > > > > > > > > > collected some information which might help to track it down. > > > > > > > > src/gallium/frontends/dri/dri2.c:648 is the assert hit: assert(*zsbuf) > > > > > > > > templ is {reference = {count = 0}, width0 = 300, height0 = 300, depth0 > > > > = 1, array_size = 1, format = PIPE_FORMAT_Z24X8_UNORM, target = > > > > PIPE_TEXTURE_2D, last_level = 0, nr_samples = 0, nr_storage_samples = > > > > 0, usage = 0, bind = 1, flags = 0, next = 0x0, screen = 0x0} > > > > > > > > inside tegra_screen_resource_create modifier says > > > > DRM_FORMAT_MOD_INVALID as template->bind is 1 > > > > > > > > and nvc0_miptree_select_best_modifier returns DRM_FORMAT_MOD_INVALID, > > > > so the call just returns NULL leading to the assert. > > > > > > > > Btw, this is on Xorg-1.20.8-1.fc32.aarch64 with glxgears. > > > > > > > > > > So I digged a bit deeper and here is what tripps it of: > > > > > > when the context gets made current, the normal framebuffer validation > > > and render buffer allocation is done, but we end up inside > > > tegra_screen_resource_create at some point with PIPE_BIND_SCANOUT set > > > in template->bind. Now the tegra driver forces the > > > DRM_FORMAT_MOD_LINEAR modifier and calls into > > > resource_create_with_modifiers. > > > > > > If it wouldn't do that, nouveau would allocate a tiled buffer, with > > > that it's linear and we at some point end up with an assert about a > > > depth_stencil buffer being there even though it shouldn't. If I always > > > use DRM_FORMAT_MOD_INVALID in tegra_screen_resource_create, things > > > just work. > > > > > > That's kind of the cause I pinpointed the issue down to. But I have no > > > idea what's supposed to happen and what the actual bug is. > > > > Yeah, the bug with tegra has always been "trying to render to linear > > color + tiled depth", which the hardware plain doesn't support. (And > > linear depth isn't a thing.) > > > > Question is whether what it's doing necessary. PIPE_BIND_SCANOUT > > (/linear) requirements are needed for DRI2 to work (well, maybe not in > > theory, but at least in practice the nouveau ddx expects linear > > buffers). However tegra operates on a more DRI3-like basis, so with > > "client" allocations, tiled should work OK as long as there's > > something in tegra to copy it to linear when necessary? > > I can confirm the above: Our hardware can't render to linear depth buffers, > nor can it mix linear color buffers with block linear depth buffers. > > I think there's a misunderstanding on expected behavior of > resource_create_with_modifiers() here too: tegra_screen_resource_create() is > passing DRM_FORMAT_MOD_INVALID as the only modifier in non-scanout cases. > Previously, I believe nouveau may have treated that as "no modifiers > specified. Fall back to internal layout selection logic", but in my patches > I "fixed" it to match other drivers' behavior, in that allocation will fail > if that is the only modifier in the list, since it is equivalent to passing > in a list containing only unsupported modifiers. To get fallback behavior, > tegra_screen_resource_create() should pass in (NULL, 0) for (modifiers, > count), or just call resource_create() on the underlying screen instead. > > Beyond that, I can only offer my thoughts based on analysis of the code > referenced here so far: > > While I've learned from the origins of this thread applications/things > external to Mesa in general shouldn't be querying format modifiers of > buffers created without format modifiers, tegra is a Mesa internal component > that already has some intimate knowledge of how the nouveau driver it sits > on top of works. Nouveau will always be able to construct and return a > valid format modifier for unorm single sampled color buffers (and hopefully, > anything that can scan out going forward), both before and after my patches > I believe, regardless of how they were allocated. After my patches, it > should even work for things that can't scan out in theory. Hence, looking > at this without knowledge of what motivated the original changes, it seems > like tegra_screen_resource_create should just naively forward the > resource_create() call, relying on nouveau to select a layout and provide a > valid modifier when queried for import. As Karol notes, this works fine for > at least this simple test case, and it's what nouveau itself would be doing > with an equivalent callstack, excepting the modifier query, so I find it > hard to believe it breaks some application behavior. It'll also end up > being equivalent (in end result, not quite semantically) to what > dri3_alloc_render_buffer() was doing prior to the patch mentioned that broke > things for Karol, so certainly for the DRI3 usage it's the right behavior. > > Ilia, what in the nouveau DDX (As in Xfree86 DDX?) assumes linear buffers? > It sounds like you don't think it will interact poorly with this path > regardless? Thierry, do you recall what motivated the force-linear code > here? This would've clearly been a good opportunity to leave a comment as to why this was, but the one that's in place in tegra_screen_resource_create() doesn't do a good job of conveying what I was thinking at the time. This is now all a very long time ago, so I don't recall the exact details. However the intention at the time was to pass the invalid modifier in the default case because I wanted Nouveau to pick whatever it wanted, assuming that we could deal with all modifiers that it could throw at us for display. At least at the time that was true, and I do think you're correct that Nouveau used to treat DRM_FORMAT_MOD_INVALID as "don't care" and then tegra_screen_import_resource() would query the modifier from Nouveau. I don't think I ever ran into the situation where Nouveau would use a modifier that we couldn't deal with (i.e. one of the "legacy" modifiers after your patch). I /think/ the linear requirement, judging by the comment, was to support certain cases that I was running into that didn't support modifiers at all, so I think the assumption was that they wouldn't be able to create the framebuffer with DRM_IOCTL_MODE_ADDFB2 and hence pitch-linear would have to be assumed. I vaguely recall that this might have been with kmscube and/or certain versions of the X server (I also seem to have a vague memory that glamor being used might have been responsible for this happening or not). Given that many things have now changed it might be worth rerunning all of those use-cases again and trace what types of resources are being created in the process and maybe make this a little saner. > As to why this works for Thierry and not Karol, that's confusing. Are you > both using the same X11 DDX (modesetting I assume?) and X server versions? > Could it be a difference in client-side DRI library code somehow? It looks like Karol might have found a commit that fixes this in the X server git. I'll run a couple of tests to see if I can reproduce with a version prior to that "fix". Thierry
On Thu, Aug 13, 2020 at 03:00:37PM +0200, Karol Herbst wrote: > At least for now I've created an MR to revert the change: > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6300 > > But it seems like there was probably a good reason why it got added? > Happy to have better fixes, but that's the best we've got so far I > think? > > Thierry, what do you think? I did find the following via my mailbox (finally collecting all those emails is paying off =): https://lists.freedesktop.org/archives/mesa-dev/2018-May/196026.html Cc'ing Daniel on the off chance that he remembers what the use-case was. I vaguely recall that Daniel was pointing out an issue on IRC that caused a series of fixes, including this change. Given that this was all done because of framebuffer modifiers, I think my earlier hunch about this being related to the case of non-modifier userspace is correct and we do run into issues with userspace that doesn't use framebuffer modifiers at all. In that case we basically have to assume pitch-linear because userspace will use DRM_IOCTL_MODE_ADDFB, which doesn't support modifiers. Thierry
On Fri, Aug 14, 2020 at 3:40 PM Thierry Reding <thierry.reding@gmail.com> wrote: > > On Wed, Aug 12, 2020 at 10:03:46AM -0700, James Jones wrote: > > On 8/12/20 5:37 AM, Ilia Mirkin wrote: > > > On Wed, Aug 12, 2020 at 8:24 AM Karol Herbst <kherbst@redhat.com> wrote: > > > > > > > > On Wed, Aug 12, 2020 at 12:43 PM Karol Herbst <kherbst@redhat.com> wrote: > > > > > > > > > > On Wed, Aug 12, 2020 at 12:27 PM Karol Herbst <kherbst@redhat.com> wrote: > > > > > > > > > > > > On Wed, Aug 12, 2020 at 2:19 AM James Jones <jajones@nvidia.com> wrote: > > > > > > > > > > > > > > Sorry for the slow reply here as well. I've been in the process of > > > > > > > rebasing and reworking the userspace patches. I'm not clear my changes > > > > > > > will address the Jetson Nano issue, but if you'd like to try them, the > > > > > > > latest userspace changes are available here: > > > > > > > > > > > > > > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3724 > > > > > > > > > > > > > > And the tegra-drm kernel patches are here: > > > > > > > > > > > > > > > > > > > > > https://patchwork.ozlabs.org/project/linux-tegra/patch/20191217005205.2573-1-jajones@nvidia.com/ > > > > > > > > > > > > > > Those + the kernel changes addressed in this thread are everything I had > > > > > > > outstanding. > > > > > > > > > > > > > > > > > > > I don't know if that's caused by your changes or not, but now the > > > > > > assert I hit is a different one pointing out that > > > > > > nvc0_miptree_select_best_modifier fails in a certain case and returns > > > > > > MOD_INVALID... anyway, it seems like with your patches applied it's > > > > > > now way easier to debug and figure out what's going wrong, so maybe I > > > > > > can figure it out now :) > > > > > > > > > > > > > > > > collected some information which might help to track it down. > > > > > > > > > > src/gallium/frontends/dri/dri2.c:648 is the assert hit: assert(*zsbuf) > > > > > > > > > > templ is {reference = {count = 0}, width0 = 300, height0 = 300, depth0 > > > > > = 1, array_size = 1, format = PIPE_FORMAT_Z24X8_UNORM, target = > > > > > PIPE_TEXTURE_2D, last_level = 0, nr_samples = 0, nr_storage_samples = > > > > > 0, usage = 0, bind = 1, flags = 0, next = 0x0, screen = 0x0} > > > > > > > > > > inside tegra_screen_resource_create modifier says > > > > > DRM_FORMAT_MOD_INVALID as template->bind is 1 > > > > > > > > > > and nvc0_miptree_select_best_modifier returns DRM_FORMAT_MOD_INVALID, > > > > > so the call just returns NULL leading to the assert. > > > > > > > > > > Btw, this is on Xorg-1.20.8-1.fc32.aarch64 with glxgears. > > > > > > > > > > > > > So I digged a bit deeper and here is what tripps it of: > > > > > > > > when the context gets made current, the normal framebuffer validation > > > > and render buffer allocation is done, but we end up inside > > > > tegra_screen_resource_create at some point with PIPE_BIND_SCANOUT set > > > > in template->bind. Now the tegra driver forces the > > > > DRM_FORMAT_MOD_LINEAR modifier and calls into > > > > resource_create_with_modifiers. > > > > > > > > If it wouldn't do that, nouveau would allocate a tiled buffer, with > > > > that it's linear and we at some point end up with an assert about a > > > > depth_stencil buffer being there even though it shouldn't. If I always > > > > use DRM_FORMAT_MOD_INVALID in tegra_screen_resource_create, things > > > > just work. > > > > > > > > That's kind of the cause I pinpointed the issue down to. But I have no > > > > idea what's supposed to happen and what the actual bug is. > > > > > > Yeah, the bug with tegra has always been "trying to render to linear > > > color + tiled depth", which the hardware plain doesn't support. (And > > > linear depth isn't a thing.) > > > > > > Question is whether what it's doing necessary. PIPE_BIND_SCANOUT > > > (/linear) requirements are needed for DRI2 to work (well, maybe not in > > > theory, but at least in practice the nouveau ddx expects linear > > > buffers). However tegra operates on a more DRI3-like basis, so with > > > "client" allocations, tiled should work OK as long as there's > > > something in tegra to copy it to linear when necessary? > > > > I can confirm the above: Our hardware can't render to linear depth buffers, > > nor can it mix linear color buffers with block linear depth buffers. > > > > I think there's a misunderstanding on expected behavior of > > resource_create_with_modifiers() here too: tegra_screen_resource_create() is > > passing DRM_FORMAT_MOD_INVALID as the only modifier in non-scanout cases. > > Previously, I believe nouveau may have treated that as "no modifiers > > specified. Fall back to internal layout selection logic", but in my patches > > I "fixed" it to match other drivers' behavior, in that allocation will fail > > if that is the only modifier in the list, since it is equivalent to passing > > in a list containing only unsupported modifiers. To get fallback behavior, > > tegra_screen_resource_create() should pass in (NULL, 0) for (modifiers, > > count), or just call resource_create() on the underlying screen instead. > > > > Beyond that, I can only offer my thoughts based on analysis of the code > > referenced here so far: > > > > While I've learned from the origins of this thread applications/things > > external to Mesa in general shouldn't be querying format modifiers of > > buffers created without format modifiers, tegra is a Mesa internal component > > that already has some intimate knowledge of how the nouveau driver it sits > > on top of works. Nouveau will always be able to construct and return a > > valid format modifier for unorm single sampled color buffers (and hopefully, > > anything that can scan out going forward), both before and after my patches > > I believe, regardless of how they were allocated. After my patches, it > > should even work for things that can't scan out in theory. Hence, looking > > at this without knowledge of what motivated the original changes, it seems > > like tegra_screen_resource_create should just naively forward the > > resource_create() call, relying on nouveau to select a layout and provide a > > valid modifier when queried for import. As Karol notes, this works fine for > > at least this simple test case, and it's what nouveau itself would be doing > > with an equivalent callstack, excepting the modifier query, so I find it > > hard to believe it breaks some application behavior. It'll also end up > > being equivalent (in end result, not quite semantically) to what > > dri3_alloc_render_buffer() was doing prior to the patch mentioned that broke > > things for Karol, so certainly for the DRI3 usage it's the right behavior. > > > > Ilia, what in the nouveau DDX (As in Xfree86 DDX?) assumes linear buffers? > > It sounds like you don't think it will interact poorly with this path > > regardless? Thierry, do you recall what motivated the force-linear code > > here? > > This would've clearly been a good opportunity to leave a comment as to > why this was, but the one that's in place in > tegra_screen_resource_create() doesn't do a good job of conveying what > I was thinking at the time. > > This is now all a very long time ago, so I don't recall the exact > details. However the intention at the time was to pass the invalid > modifier in the default case because I wanted Nouveau to pick whatever > it wanted, assuming that we could deal with all modifiers that it could > throw at us for display. At least at the time that was true, and I do > think you're correct that Nouveau used to treat DRM_FORMAT_MOD_INVALID > as "don't care" and then tegra_screen_import_resource() would query the > modifier from Nouveau. I don't think I ever ran into the situation where > Nouveau would use a modifier that we couldn't deal with (i.e. one of the > "legacy" modifiers after your patch). > > I /think/ the linear requirement, judging by the comment, was to support > certain cases that I was running into that didn't support modifiers at > all, so I think the assumption was that they wouldn't be able to create > the framebuffer with DRM_IOCTL_MODE_ADDFB2 and hence pitch-linear would > have to be assumed. > > I vaguely recall that this might have been with kmscube and/or certain > versions of the X server (I also seem to have a vague memory that glamor > being used might have been responsible for this happening or not). > > Given that many things have now changed it might be worth rerunning all > of those use-cases again and trace what types of resources are being > created in the process and maybe make this a little saner. > > > As to why this works for Thierry and not Karol, that's confusing. Are you > > both using the same X11 DDX (modesetting I assume?) and X server versions? > > Could it be a difference in client-side DRI library code somehow? > > It looks like Karol might have found a commit that fixes this in the X > server git. I'll run a couple of tests to see if I can reproduce with a > version prior to that "fix". > before spending too much time on that, Xorg that old usually does not build anymore, but any 1.20.z release should trigger the issue, so you can just go with the newest release. > Thierry
On Thu, Aug 13, 2020 at 07:48:39PM +0200, Karol Herbst wrote: > On Thu, Aug 13, 2020 at 7:45 PM James Jones <jajones@nvidia.com> wrote: > > > > I'll defer to Thierry, but I think that may be by design. Tegra format > > modifiers were added to get things like this working in the first place, > > right? It's not a regression, is it? > > > > That would be slightly annoying as this would mean by design it's > broken by default :/ Also, we have no Xorg release supporting > modifiers anyway and it does seem to work with X 1.20.8 (which doesn't > enable modifier support). And I talked with Jonas (working on mutter) > about it and there were no plans to turn on modifier support by > default at this point. I thought you said earlier that 1.20.8 didn't work and was hitting the assertion? Thierry
On Fri, Aug 14, 2020 at 3:57 PM Thierry Reding <thierry.reding@gmail.com> wrote: > > On Thu, Aug 13, 2020 at 07:48:39PM +0200, Karol Herbst wrote: > > On Thu, Aug 13, 2020 at 7:45 PM James Jones <jajones@nvidia.com> wrote: > > > > > > I'll defer to Thierry, but I think that may be by design. Tegra format > > > modifiers were added to get things like this working in the first place, > > > right? It's not a regression, is it? > > > > > > > That would be slightly annoying as this would mean by design it's > > broken by default :/ Also, we have no Xorg release supporting > > modifiers anyway and it does seem to work with X 1.20.8 (which doesn't > > enable modifier support). And I talked with Jonas (working on mutter) > > about it and there were no plans to turn on modifier support by > > default at this point. > > I thought you said earlier that 1.20.8 didn't work and was hitting the > assertion? > uhm, I forgot to mention that it works with the patch I wrote: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6300 > Thierry
On Thu, Aug 13, 2020 at 10:45:34AM -0700, James Jones wrote: > I'll defer to Thierry, but I think that may be by design. Tegra format > modifiers were added to get things like this working in the first place, > right? It's not a regression, is it? I recall that things used to work with or without modifiers at some point. That was basically the point of the "pitch-linear by default" patch, see: https://gitlab.freedesktop.org/mesa/mesa/-/commit/9603d81df05105857b676f20dff964ef3ab0ecff If we don't force pitch-linear for cases where we don't have modifiers, there's no way we can properly display these as framebuffers because we lack the modifier information at the kernel level. Thierry
On Thu, Aug 13, 2020 at 05:39:39PM +0200, Karol Herbst wrote: > btw, I just noticed that wayland with gnome-shell is totally busted. > With this MR it at least displays something, but without it doesn't > work at all. Interesting, one of my typical test cases is to run Weston with a couple of test programs (like weston-simple-egl). Those usually work. I'll go run a few more tests to see where we are. To clarify, is this gnome-shell/wayland issue happening with Mesa's mainline, or with James' patches already applied? Thierry
On Fri, Aug 14, 2020 at 03:59:16PM +0200, Karol Herbst wrote: > On Fri, Aug 14, 2020 at 3:57 PM Thierry Reding <thierry.reding@gmail.com> wrote: > > > > On Thu, Aug 13, 2020 at 07:48:39PM +0200, Karol Herbst wrote: > > > On Thu, Aug 13, 2020 at 7:45 PM James Jones <jajones@nvidia.com> wrote: > > > > > > > > I'll defer to Thierry, but I think that may be by design. Tegra format > > > > modifiers were added to get things like this working in the first place, > > > > right? It's not a regression, is it? > > > > > > > > > > That would be slightly annoying as this would mean by design it's > > > broken by default :/ Also, we have no Xorg release supporting > > > modifiers anyway and it does seem to work with X 1.20.8 (which doesn't > > > enable modifier support). And I talked with Jonas (working on mutter) > > > about it and there were no plans to turn on modifier support by > > > default at this point. > > > > I thought you said earlier that 1.20.8 didn't work and was hitting the > > assertion? > > > > uhm, I forgot to mention that it works with the patch I wrote: > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6300 Okay, good to know. Like I said in another subthread, I think we need to make sure that this doesn't break other use-cases. I vaguely recall that there was one specific configuration that broke without that and I think it was when glamor was disabled because that caused X to allocate buffers without modifiers set. Let me try if I can reproduce such a build somehow and check if that fails again with the revert. Thierry
On Fri, Aug 14, 2020 at 4:05 PM Thierry Reding <thierry.reding@gmail.com> wrote: > > On Thu, Aug 13, 2020 at 10:45:34AM -0700, James Jones wrote: > > I'll defer to Thierry, but I think that may be by design. Tegra format > > modifiers were added to get things like this working in the first place, > > right? It's not a regression, is it? > > I recall that things used to work with or without modifiers at some > point. That was basically the point of the "pitch-linear by default" > patch, see: > > https://gitlab.freedesktop.org/mesa/mesa/-/commit/9603d81df05105857b676f20dff964ef3ab0ecff > > If we don't force pitch-linear for cases where we don't have modifiers, > there's no way we can properly display these as framebuffers because we > lack the modifier information at the kernel level. > I suspect that's related to https://gitlab.freedesktop.org/mesa/mesa/commit/c56fe4118a2dd991ff1b2a532c0f234eddd435e9 > Thierry
On Fri, Aug 14, 2020 at 4:08 PM Thierry Reding <thierry.reding@gmail.com> wrote: > > On Thu, Aug 13, 2020 at 05:39:39PM +0200, Karol Herbst wrote: > > btw, I just noticed that wayland with gnome-shell is totally busted. > > With this MR it at least displays something, but without it doesn't > > work at all. > > Interesting, one of my typical test cases is to run Weston with a couple > of test programs (like weston-simple-egl). Those usually work. I'll go > run a few more tests to see where we are. > > To clarify, is this gnome-shell/wayland issue happening with Mesa's > mainline, or with James' patches already applied? > mainline. It does work for me on weston, but that's because weston is always modifier aware afaik. For gnome-shell/wayland we have to enable it to make it work. > Thierry
On Fri, Aug 14, 2020 at 04:45:39PM +0200, Karol Herbst wrote: > On Fri, Aug 14, 2020 at 4:08 PM Thierry Reding <thierry.reding@gmail.com> wrote: > > > > On Thu, Aug 13, 2020 at 05:39:39PM +0200, Karol Herbst wrote: > > > btw, I just noticed that wayland with gnome-shell is totally busted. > > > With this MR it at least displays something, but without it doesn't > > > work at all. > > > > Interesting, one of my typical test cases is to run Weston with a couple > > of test programs (like weston-simple-egl). Those usually work. I'll go > > run a few more tests to see where we are. > > > > To clarify, is this gnome-shell/wayland issue happening with Mesa's > > mainline, or with James' patches already applied? > > > > mainline. It does work for me on weston, but that's because weston is > always modifier aware afaik. For gnome-shell/wayland we have to enable > it to make it work. For some reason I can't get my mouse to work in Weston and it seems like that's the only way to start a terminal... But sounds like that wouldn't be any good anyway since it's different from that use-case. Apart from building gnome-shell, which I recall has a large number of dependencies, are you aware of another use-case that would allow testing the code paths with no modifiers? Sounds like perhaps that would be interesting to add to Weston as a knob to test these somewhat legacy paths. Thierry
On Fri, Aug 14, 2020 at 04:44:43PM +0200, Karol Herbst wrote: > On Fri, Aug 14, 2020 at 4:05 PM Thierry Reding <thierry.reding@gmail.com> wrote: > > > > On Thu, Aug 13, 2020 at 10:45:34AM -0700, James Jones wrote: > > > I'll defer to Thierry, but I think that may be by design. Tegra format > > > modifiers were added to get things like this working in the first place, > > > right? It's not a regression, is it? > > > > I recall that things used to work with or without modifiers at some > > point. That was basically the point of the "pitch-linear by default" > > patch, see: > > > > https://gitlab.freedesktop.org/mesa/mesa/-/commit/9603d81df05105857b676f20dff964ef3ab0ecff > > > > If we don't force pitch-linear for cases where we don't have modifiers, > > there's no way we can properly display these as framebuffers because we > > lack the modifier information at the kernel level. > > > > I suspect that's related to > https://gitlab.freedesktop.org/mesa/mesa/commit/c56fe4118a2dd991ff1b2a532c0f234eddd435e9 But we're checking the the PIPE_USAGE_SCANOUT flag specifically for that type of case, so this should work. On the other hand, it does sound like that logic might actually work, but for some reason Nouveau then ends up allocating a linear render buffer and a tiled depth buffer, and that's the combination that doesn't work. Did I understand that correctly? So it sounds like there's a few options: a) Make Nouveau render without a depth buffer in these cases (not sure if that's even possible). b) Allocate a linear buffer *in addition* to the normal buffer that Nouveau allocates (with whatever it selects as the default layout) and then blit from the tiled buffer to the linear buffer everytime we need to. I think that'd be basically reproducing the code that is controlled by the DRI_PRIME environment variable. c) Accept that modifiers are the way to go and rely on them for proper functionality. This is obviously a really bad solution because not everyone has transitioned to framebuffer modifiers yet. On the other hand, this is precisely the kind of use-case that framebuffer modifiers were meant to address, so it isn't all that far-fetched to make them a requirement. None of those are really great options... does anyone have any other ideas? Thierry
On Fri, Aug 14, 2020 at 5:35 PM Thierry Reding <thierry.reding@gmail.com> wrote: > > On Fri, Aug 14, 2020 at 04:44:43PM +0200, Karol Herbst wrote: > > On Fri, Aug 14, 2020 at 4:05 PM Thierry Reding <thierry.reding@gmail.com> wrote: > > > > > > On Thu, Aug 13, 2020 at 10:45:34AM -0700, James Jones wrote: > > > > I'll defer to Thierry, but I think that may be by design. Tegra format > > > > modifiers were added to get things like this working in the first place, > > > > right? It's not a regression, is it? > > > > > > I recall that things used to work with or without modifiers at some > > > point. That was basically the point of the "pitch-linear by default" > > > patch, see: > > > > > > https://gitlab.freedesktop.org/mesa/mesa/-/commit/9603d81df05105857b676f20dff964ef3ab0ecff > > > > > > If we don't force pitch-linear for cases where we don't have modifiers, > > > there's no way we can properly display these as framebuffers because we > > > lack the modifier information at the kernel level. > > > > > > > I suspect that's related to > > https://gitlab.freedesktop.org/mesa/mesa/commit/c56fe4118a2dd991ff1b2a532c0f234eddd435e9 > > But we're checking the the PIPE_USAGE_SCANOUT flag specifically for that > type of case, so this should work. On the other hand, it does sound like > that logic might actually work, but for some reason Nouveau then ends up > allocating a linear render buffer and a tiled depth buffer, and that's > the combination that doesn't work. Did I understand that correctly? > > So it sounds like there's a few options: > > a) Make Nouveau render without a depth buffer in these cases (not sure > if that's even possible). > > b) Allocate a linear buffer *in addition* to the normal buffer that > Nouveau allocates (with whatever it selects as the default layout) > and then blit from the tiled buffer to the linear buffer everytime > we need to. I think that'd be basically reproducing the code that > is controlled by the DRI_PRIME environment variable. > yeah, I think Ilia suggested something like that a long time ago as well. > c) Accept that modifiers are the way to go and rely on them for proper > functionality. This is obviously a really bad solution because not > everyone has transitioned to framebuffer modifiers yet. On the > other hand, this is precisely the kind of use-case that framebuffer > modifiers were meant to address, so it isn't all that far-fetched > to make them a requirement. > the problem with that is, that by default we don't have modifier aware desktops. gnome-shell by default and no Xorg release. > None of those are really great options... does anyone have any other > ideas? > What I am wondering about is why with my patch it simply works in X, but that could be a mere coincidence. > Thierry
On Fri, Aug 14, 2020 at 5:24 PM Thierry Reding <thierry.reding@gmail.com> wrote: > > On Fri, Aug 14, 2020 at 04:45:39PM +0200, Karol Herbst wrote: > > On Fri, Aug 14, 2020 at 4:08 PM Thierry Reding <thierry.reding@gmail.com> wrote: > > > > > > On Thu, Aug 13, 2020 at 05:39:39PM +0200, Karol Herbst wrote: > > > > btw, I just noticed that wayland with gnome-shell is totally busted. > > > > With this MR it at least displays something, but without it doesn't > > > > work at all. > > > > > > Interesting, one of my typical test cases is to run Weston with a couple > > > of test programs (like weston-simple-egl). Those usually work. I'll go > > > run a few more tests to see where we are. > > > > > > To clarify, is this gnome-shell/wayland issue happening with Mesa's > > > mainline, or with James' patches already applied? > > > > > > > mainline. It does work for me on weston, but that's because weston is > > always modifier aware afaik. For gnome-shell/wayland we have to enable > > it to make it work. > > For some reason I can't get my mouse to work in Weston and it seems like > that's the only way to start a terminal... But sounds like that wouldn't > be any good anyway since it's different from that use-case. Apart from > building gnome-shell, which I recall has a large number of dependencies, > are you aware of another use-case that would allow testing the code > paths with no modifiers? > with the newest l4t releases you can just copy whatever distribution provides filesystem tarballs and generate images yourself. But yeah.. this is still a bit messy to do and I don't have a script which just works without having to go through other steps first :/ Don't know what your system is based on, but the fedora 32 images should more or less work (unless you need your own kernel and stuff) > Sounds like perhaps that would be interesting to add to Weston as a knob > to test these somewhat legacy paths. > > Thierry
On Fri, Aug 14, 2020 at 05:40:34PM +0200, Karol Herbst wrote: > On Fri, Aug 14, 2020 at 5:35 PM Thierry Reding <thierry.reding@gmail.com> wrote: > > > > On Fri, Aug 14, 2020 at 04:44:43PM +0200, Karol Herbst wrote: > > > On Fri, Aug 14, 2020 at 4:05 PM Thierry Reding <thierry.reding@gmail.com> wrote: > > > > > > > > On Thu, Aug 13, 2020 at 10:45:34AM -0700, James Jones wrote: > > > > > I'll defer to Thierry, but I think that may be by design. Tegra format > > > > > modifiers were added to get things like this working in the first place, > > > > > right? It's not a regression, is it? > > > > > > > > I recall that things used to work with or without modifiers at some > > > > point. That was basically the point of the "pitch-linear by default" > > > > patch, see: > > > > > > > > https://gitlab.freedesktop.org/mesa/mesa/-/commit/9603d81df05105857b676f20dff964ef3ab0ecff > > > > > > > > If we don't force pitch-linear for cases where we don't have modifiers, > > > > there's no way we can properly display these as framebuffers because we > > > > lack the modifier information at the kernel level. > > > > > > > > > > I suspect that's related to > > > https://gitlab.freedesktop.org/mesa/mesa/commit/c56fe4118a2dd991ff1b2a532c0f234eddd435e9 > > > > But we're checking the the PIPE_USAGE_SCANOUT flag specifically for that > > type of case, so this should work. On the other hand, it does sound like > > that logic might actually work, but for some reason Nouveau then ends up > > allocating a linear render buffer and a tiled depth buffer, and that's > > the combination that doesn't work. Did I understand that correctly? > > > > So it sounds like there's a few options: > > > > a) Make Nouveau render without a depth buffer in these cases (not sure > > if that's even possible). > > > > b) Allocate a linear buffer *in addition* to the normal buffer that > > Nouveau allocates (with whatever it selects as the default layout) > > and then blit from the tiled buffer to the linear buffer everytime > > we need to. I think that'd be basically reproducing the code that > > is controlled by the DRI_PRIME environment variable. > > > > yeah, I think Ilia suggested something like that a long time ago as well. > > > c) Accept that modifiers are the way to go and rely on them for proper > > functionality. This is obviously a really bad solution because not > > everyone has transitioned to framebuffer modifiers yet. On the > > other hand, this is precisely the kind of use-case that framebuffer > > modifiers were meant to address, so it isn't all that far-fetched > > to make them a requirement. > > > > the problem with that is, that by default we don't have modifier aware > desktops. gnome-shell by default and no Xorg release. > > > None of those are really great options... does anyone have any other > > ideas? > > > > What I am wondering about is why with my patch it simply works in X, > but that could be a mere coincidence. So I was testing your revert with Weston and that "works", but only as in "it doesn't crash". As expected, since there's now a mismatch between what layout Nouveau assumes and the pitch-linear buffer that modesetting assumes it got, it'll now be completely scrambled. That said, even without the revert I can't seem to be able to make Weston work without modifiers support. I'll have to investigate some more why that's not working. Perhaps it doesn't pass the correct usage flags? Thierry
On Fri, Aug 14, 2020 at 6:06 PM Thierry Reding <thierry.reding@gmail.com> wrote: > > On Fri, Aug 14, 2020 at 05:40:34PM +0200, Karol Herbst wrote: > > On Fri, Aug 14, 2020 at 5:35 PM Thierry Reding <thierry.reding@gmail.com> wrote: > > > > > > On Fri, Aug 14, 2020 at 04:44:43PM +0200, Karol Herbst wrote: > > > > On Fri, Aug 14, 2020 at 4:05 PM Thierry Reding <thierry.reding@gmail.com> wrote: > > > > > > > > > > On Thu, Aug 13, 2020 at 10:45:34AM -0700, James Jones wrote: > > > > > > I'll defer to Thierry, but I think that may be by design. Tegra format > > > > > > modifiers were added to get things like this working in the first place, > > > > > > right? It's not a regression, is it? > > > > > > > > > > I recall that things used to work with or without modifiers at some > > > > > point. That was basically the point of the "pitch-linear by default" > > > > > patch, see: > > > > > > > > > > https://gitlab.freedesktop.org/mesa/mesa/-/commit/9603d81df05105857b676f20dff964ef3ab0ecff > > > > > > > > > > If we don't force pitch-linear for cases where we don't have modifiers, > > > > > there's no way we can properly display these as framebuffers because we > > > > > lack the modifier information at the kernel level. > > > > > > > > > > > > > I suspect that's related to > > > > https://gitlab.freedesktop.org/mesa/mesa/commit/c56fe4118a2dd991ff1b2a532c0f234eddd435e9 > > > > > > But we're checking the the PIPE_USAGE_SCANOUT flag specifically for that > > > type of case, so this should work. On the other hand, it does sound like > > > that logic might actually work, but for some reason Nouveau then ends up > > > allocating a linear render buffer and a tiled depth buffer, and that's > > > the combination that doesn't work. Did I understand that correctly? > > > > > > So it sounds like there's a few options: > > > > > > a) Make Nouveau render without a depth buffer in these cases (not sure > > > if that's even possible). > > > > > > b) Allocate a linear buffer *in addition* to the normal buffer that > > > Nouveau allocates (with whatever it selects as the default layout) > > > and then blit from the tiled buffer to the linear buffer everytime > > > we need to. I think that'd be basically reproducing the code that > > > is controlled by the DRI_PRIME environment variable. > > > > > > > yeah, I think Ilia suggested something like that a long time ago as well. > > > > > c) Accept that modifiers are the way to go and rely on them for proper > > > functionality. This is obviously a really bad solution because not > > > everyone has transitioned to framebuffer modifiers yet. On the > > > other hand, this is precisely the kind of use-case that framebuffer > > > modifiers were meant to address, so it isn't all that far-fetched > > > to make them a requirement. > > > > > > > the problem with that is, that by default we don't have modifier aware > > desktops. gnome-shell by default and no Xorg release. > > > > > None of those are really great options... does anyone have any other > > > ideas? > > > > > > > What I am wondering about is why with my patch it simply works in X, > > but that could be a mere coincidence. > > So I was testing your revert with Weston and that "works", but only as > in "it doesn't crash". As expected, since there's now a mismatch between > what layout Nouveau assumes and the pitch-linear buffer that modesetting > assumes it got, it'll now be completely scrambled. That said, even > without the revert I can't seem to be able to make Weston work without > modifiers support. > yeah, that does seem to reproduce what I noticed with gnome-shell/wayland as well. > I'll have to investigate some more why that's not working. Perhaps it > doesn't pass the correct usage flags? > > Thierry
On Fri, Aug 14, 2020 at 06:12:56PM +0200, Karol Herbst wrote: > On Fri, Aug 14, 2020 at 6:06 PM Thierry Reding <thierry.reding@gmail.com> wrote: > > > > On Fri, Aug 14, 2020 at 05:40:34PM +0200, Karol Herbst wrote: > > > On Fri, Aug 14, 2020 at 5:35 PM Thierry Reding <thierry.reding@gmail.com> wrote: > > > > > > > > On Fri, Aug 14, 2020 at 04:44:43PM +0200, Karol Herbst wrote: > > > > > On Fri, Aug 14, 2020 at 4:05 PM Thierry Reding <thierry.reding@gmail.com> wrote: > > > > > > > > > > > > On Thu, Aug 13, 2020 at 10:45:34AM -0700, James Jones wrote: > > > > > > > I'll defer to Thierry, but I think that may be by design. Tegra format > > > > > > > modifiers were added to get things like this working in the first place, > > > > > > > right? It's not a regression, is it? > > > > > > > > > > > > I recall that things used to work with or without modifiers at some > > > > > > point. That was basically the point of the "pitch-linear by default" > > > > > > patch, see: > > > > > > > > > > > > https://gitlab.freedesktop.org/mesa/mesa/-/commit/9603d81df05105857b676f20dff964ef3ab0ecff > > > > > > > > > > > > If we don't force pitch-linear for cases where we don't have modifiers, > > > > > > there's no way we can properly display these as framebuffers because we > > > > > > lack the modifier information at the kernel level. > > > > > > > > > > > > > > > > I suspect that's related to > > > > > https://gitlab.freedesktop.org/mesa/mesa/commit/c56fe4118a2dd991ff1b2a532c0f234eddd435e9 > > > > > > > > But we're checking the the PIPE_USAGE_SCANOUT flag specifically for that > > > > type of case, so this should work. On the other hand, it does sound like > > > > that logic might actually work, but for some reason Nouveau then ends up > > > > allocating a linear render buffer and a tiled depth buffer, and that's > > > > the combination that doesn't work. Did I understand that correctly? > > > > > > > > So it sounds like there's a few options: > > > > > > > > a) Make Nouveau render without a depth buffer in these cases (not sure > > > > if that's even possible). > > > > > > > > b) Allocate a linear buffer *in addition* to the normal buffer that > > > > Nouveau allocates (with whatever it selects as the default layout) > > > > and then blit from the tiled buffer to the linear buffer everytime > > > > we need to. I think that'd be basically reproducing the code that > > > > is controlled by the DRI_PRIME environment variable. > > > > > > > > > > yeah, I think Ilia suggested something like that a long time ago as well. > > > > > > > c) Accept that modifiers are the way to go and rely on them for proper > > > > functionality. This is obviously a really bad solution because not > > > > everyone has transitioned to framebuffer modifiers yet. On the > > > > other hand, this is precisely the kind of use-case that framebuffer > > > > modifiers were meant to address, so it isn't all that far-fetched > > > > to make them a requirement. > > > > > > > > > > the problem with that is, that by default we don't have modifier aware > > > desktops. gnome-shell by default and no Xorg release. > > > > > > > None of those are really great options... does anyone have any other > > > > ideas? > > > > > > > > > > What I am wondering about is why with my patch it simply works in X, > > > but that could be a mere coincidence. > > > > So I was testing your revert with Weston and that "works", but only as > > in "it doesn't crash". As expected, since there's now a mismatch between > > what layout Nouveau assumes and the pitch-linear buffer that modesetting > > assumes it got, it'll now be completely scrambled. That said, even > > without the revert I can't seem to be able to make Weston work without > > modifiers support. > > > > yeah, that does seem to reproduce what I noticed with > gnome-shell/wayland as well. Oh, here's the patch I used to disable framebuffer modifiers in Weston, in case anyone is interested: --- >8 --- diff --git a/libweston/backend-drm/kms.c b/libweston/backend-drm/kms.c index f5215f20d694..889b2444b99f 100644 --- a/libweston/backend-drm/kms.c +++ b/libweston/backend-drm/kms.c @@ -1480,6 +1480,10 @@ init_kms_caps(struct drm_backend *b) else b->fb_modifiers = 0; + if (getenv("WESTON_DISABLE_FB_MODIFIERS")) { + b->fb_modifiers = 0; + } + /* * KMS support for hardware planes cannot properly synchronize * without nuclear page flip. Without nuclear/atomic, hw plane --- >8 --- I suspect that the reason why this works in X but not in Wayland is because X passes the right usage flags, whereas Weston may not. But I'll have to investigate more in order to be sure. In either case, it does seem to me like b) above would be the best solution for the legacy case (i.e. where we don't have modifiers). It's not going to be great in terms of performance because of the extra copy, but I thought I had seen one of the render-only drivers do something similar, so I'll go look at them to see if I can use them as inspiration for implementing something like this on Tegra. This should ideally only be an issue for desktops that don't support FB modifiers at the moment. Even if that's the default for most at this point, hopefully that will change down the road. Perhaps we can go and release X 1.21.0 with that modifier enablement patch and that'll motivate desktops to adopt it as well as the default? Thierry
Hi, On Fri, 14 Aug 2020 at 17:22, Thierry Reding <thierry.reding@gmail.com> wrote: > I suspect that the reason why this works in X but not in Wayland is > because X passes the right usage flags, whereas Weston may not. But I'll > have to investigate more in order to be sure. Weston allocates its own buffers for displaying the result of composition through GBM with USE_SCANOUT, which is definitely correct. Wayland clients (common to all compositors, in Mesa's src/egl/drivers/dri2/platform_wayland.c) allocate with USE_SHARED but _not_ USE_SCANOUT, which is correct in that they are guaranteed to be shared, but not guaranteed to be scanned out. The expectation is that non-scanout-compatible buffers would be rejected by gbm_bo_import if not drmModeAddFB2. One difference between Weston and all other compositors (GNOME Shell, KWin, Sway, etc) is that Weston uses KMS planes for composition when it can (i.e. when gbm_bo_import from dmabuf + drmModeAddFB2 from gbm_bo handle + atomic check succeed), but the other compositors only use the GPU. So if you have different assumptions about the layout of imported buffers between the GPU and KMS, that would explain a fair bit. > Perhaps we can go and release X 1.21.0 with that modifier enablement > patch and that'll motivate desktops to adopt it as well as the default? Unfortunately we don't really have a good way out of this one. They were disabled because the non-modifier path on Intel can be linear or X-tiled (row-major), whereas the modifier path enables Y-tiled (column-major) and compressed layouts. Y-tiled is the most efficient, but Intel could only spare about six transistors for the global FIFO shared between all their plane fetch engines, and Y-tiled blows straight through it. Both X and Shell would thus fail to enable high resolutions or many heads (2x 4K is enough even on modern platforms IIRC), so they just turned modifiers off. The best solution would be to do a global atomic_check across all outputs and just blacklist modifiers until you find one which works, but Shell doesn't yet have that code, and -modesetting ... well, no-one's volunteered to do that yet, or probably ever. Cheers, Daniel
On Fri, Aug 14, 2020 at 7:17 PM Daniel Stone <daniel@fooishbar.org> wrote: > > Hi, > > On Fri, 14 Aug 2020 at 17:22, Thierry Reding <thierry.reding@gmail.com> wrote: > > I suspect that the reason why this works in X but not in Wayland is > > because X passes the right usage flags, whereas Weston may not. But I'll > > have to investigate more in order to be sure. > > Weston allocates its own buffers for displaying the result of > composition through GBM with USE_SCANOUT, which is definitely correct. > > Wayland clients (common to all compositors, in Mesa's > src/egl/drivers/dri2/platform_wayland.c) allocate with USE_SHARED but > _not_ USE_SCANOUT, which is correct in that they are guaranteed to be > shared, but not guaranteed to be scanned out. The expectation is that > non-scanout-compatible buffers would be rejected by gbm_bo_import if > not drmModeAddFB2. > > One difference between Weston and all other compositors (GNOME Shell, > KWin, Sway, etc) is that Weston uses KMS planes for composition when > it can (i.e. when gbm_bo_import from dmabuf + drmModeAddFB2 from > gbm_bo handle + atomic check succeed), but the other compositors only > use the GPU. So if you have different assumptions about the layout of > imported buffers between the GPU and KMS, that would explain a fair > bit. Yeah non-modifiered multi-gpu (of any kind) is pretty much hopeless I think. I guess the only option is if the tegra mesa driver forces linear and an extra copy on everything that's USE_SHARED or USE_SCANOUT. > > Perhaps we can go and release X 1.21.0 with that modifier enablement > > patch and that'll motivate desktops to adopt it as well as the default? > > Unfortunately we don't really have a good way out of this one. They > were disabled because the non-modifier path on Intel can be linear or > X-tiled (row-major), whereas the modifier path enables Y-tiled > (column-major) and compressed layouts. Y-tiled is the most efficient, > but Intel could only spare about six transistors for the global FIFO > shared between all their plane fetch engines, and Y-tiled blows > straight through it. Both X and Shell would thus fail to enable high > resolutions or many heads (2x 4K is enough even on modern platforms > IIRC), so they just turned modifiers off. > > The best solution would be to do a global atomic_check across all > outputs and just blacklist modifiers until you find one which works, > but Shell doesn't yet have that code, and -modesetting ... well, > no-one's volunteered to do that yet, or probably ever. Yeah best bet for modifiered X is Xwayland on top of weston right now :-/ -Daniel
On Fri, Aug 14, 2020 at 07:25:17PM +0200, Daniel Vetter wrote: > On Fri, Aug 14, 2020 at 7:17 PM Daniel Stone <daniel@fooishbar.org> wrote: > > > > Hi, > > > > On Fri, 14 Aug 2020 at 17:22, Thierry Reding <thierry.reding@gmail.com> wrote: > > > I suspect that the reason why this works in X but not in Wayland is > > > because X passes the right usage flags, whereas Weston may not. But I'll > > > have to investigate more in order to be sure. > > > > Weston allocates its own buffers for displaying the result of > > composition through GBM with USE_SCANOUT, which is definitely correct. > > > > Wayland clients (common to all compositors, in Mesa's > > src/egl/drivers/dri2/platform_wayland.c) allocate with USE_SHARED but > > _not_ USE_SCANOUT, which is correct in that they are guaranteed to be > > shared, but not guaranteed to be scanned out. The expectation is that > > non-scanout-compatible buffers would be rejected by gbm_bo_import if > > not drmModeAddFB2. > > > > One difference between Weston and all other compositors (GNOME Shell, > > KWin, Sway, etc) is that Weston uses KMS planes for composition when > > it can (i.e. when gbm_bo_import from dmabuf + drmModeAddFB2 from > > gbm_bo handle + atomic check succeed), but the other compositors only > > use the GPU. So if you have different assumptions about the layout of > > imported buffers between the GPU and KMS, that would explain a fair > > bit. > > Yeah non-modifiered multi-gpu (of any kind) is pretty much hopeless I > think. I guess the only option is if the tegra mesa driver forces > linear and an extra copy on everything that's USE_SHARED or > USE_SCANOUT. I ended up trying this, but this fails for the X case, unfortunately, because there doesn't seem to be a good synchronization point at which the de-tiling blit could be done. Weston and kmscube end up calling a gallium driver's ->flush_resource() implementation, but that never happens for X and glamor. But after looking into this some more, I don't think that's even the problem that we're facing here. The root of the problem that causes the glxgears crash that Karol was originally reporting is because we end up allocating the glxgears pixmaps using the dri3 loader from Mesa. But the dri3 loader will unconditionally pass both __DRI_IMAGE_USE_SHARE and __DRI_IMAGE_USE_SCANOUT, irrespective of whether the buffer will end up being scanned out directly or whether it will be composited onto the root window. What exactly happens depends on whether I run glxgears in fullscreen mode or windowed mode. In windowed mode, the glxgears buffers will be composited onto the root window, so there's no need for the buffers to be scanout-capable. If I modify the dri3 loader to not pass those flags I can make this work just fine. When I run glxgears in fullscreen mode, the modesetting driver ends up wanting to display the glxgears buffer directly on screen, without compositing it onto the root window. This ends up working if I leave out the _USE_SHARE and _USE_SCANOUT flags, but I notice that the kernel then complains about being unable to create a framebuffer, which in turn is caused by the fact that those buffers are not exported (the Tegra Mesa driver only exports/imports buffers that are meant for scanout, under the assumption that those are the only ones that will ever need to be used by KMS) and therefore Tegra DRM doesn't have a valid handle for them. So I think an ideal solution would probably be for glxgears to somehow pass better usage information when allocating buffers, but I suspect that that's just not possible, or would be way too much work and require additional protocol at the DRI level, so it's not really a good option when all we want to fix is backwards-compatibility with pre-modifiers userspace. Given that glamor also doesn't have any synchronization points, I don't see how I can implement the de-tiling blit reliably. I was wondering if it shouldn't be possible to flush the framebuffer resource (and perform the blit) at presentation time, but I couldn't find a good entry point to do this. One other solution that occurred to me was to reintroduce an old IOCTL that we used to have in the Tegra DRM driver. That IOCTL was meant to attach tiling meta data to an imported buffer and was basically a simplified, driver-specific way of doing framebuffer modifiers. That's a very ugly solution, but it would allow us to be backwards-compatible with pre-modifiers userspace and even use an optimal path for rendering and scanning out. The only prerequisite would be that the driver IOCTL was implemented and that a recent enough Mesa was used to make use of it. I don't like this very much because framebuffer modifiers are a much more generic solution, but all of the other options above are pretty much just as ugly. One other idea that I haven't explored yet is to be a little more clever about the export/import dance that we do for buffers. Currently we export/import at allocation time, and that seems to cause a bit of a problem, like the lack of valid GEM handles for some buffers (such as in the glxgears fullscreen use-case discussed above). I wonder if perhaps deferring the export/import dance until the handles are actually required may be a better way to do this. With such a solution, even if a buffer is allocated for scanout, it won't actually be imported/exported if the client ends up being composited onto the root window. Import and export would be limited to buffers that truly are going to be used for drmModeAddFB2(). I'll give that a shot and see if that gets me closer to my goal. Thierry
On Tue, Aug 18, 2020 at 04:37:51PM +0200, Thierry Reding wrote: > On Fri, Aug 14, 2020 at 07:25:17PM +0200, Daniel Vetter wrote: > > On Fri, Aug 14, 2020 at 7:17 PM Daniel Stone <daniel@fooishbar.org> wrote: > > > > > > Hi, > > > > > > On Fri, 14 Aug 2020 at 17:22, Thierry Reding <thierry.reding@gmail.com> wrote: > > > > I suspect that the reason why this works in X but not in Wayland is > > > > because X passes the right usage flags, whereas Weston may not. But I'll > > > > have to investigate more in order to be sure. > > > > > > Weston allocates its own buffers for displaying the result of > > > composition through GBM with USE_SCANOUT, which is definitely correct. > > > > > > Wayland clients (common to all compositors, in Mesa's > > > src/egl/drivers/dri2/platform_wayland.c) allocate with USE_SHARED but > > > _not_ USE_SCANOUT, which is correct in that they are guaranteed to be > > > shared, but not guaranteed to be scanned out. The expectation is that > > > non-scanout-compatible buffers would be rejected by gbm_bo_import if > > > not drmModeAddFB2. > > > > > > One difference between Weston and all other compositors (GNOME Shell, > > > KWin, Sway, etc) is that Weston uses KMS planes for composition when > > > it can (i.e. when gbm_bo_import from dmabuf + drmModeAddFB2 from > > > gbm_bo handle + atomic check succeed), but the other compositors only > > > use the GPU. So if you have different assumptions about the layout of > > > imported buffers between the GPU and KMS, that would explain a fair > > > bit. > > > > Yeah non-modifiered multi-gpu (of any kind) is pretty much hopeless I > > think. I guess the only option is if the tegra mesa driver forces > > linear and an extra copy on everything that's USE_SHARED or > > USE_SCANOUT. > > I ended up trying this, but this fails for the X case, unfortunately, > because there doesn't seem to be a good synchronization point at which > the de-tiling blit could be done. Weston and kmscube end up calling a > gallium driver's ->flush_resource() implementation, but that never > happens for X and glamor. > > But after looking into this some more, I don't think that's even the > problem that we're facing here. The root of the problem that causes the > glxgears crash that Karol was originally reporting is because we end up > allocating the glxgears pixmaps using the dri3 loader from Mesa. But the > dri3 loader will unconditionally pass both __DRI_IMAGE_USE_SHARE and > __DRI_IMAGE_USE_SCANOUT, irrespective of whether the buffer will end up > being scanned out directly or whether it will be composited onto the > root window. > > What exactly happens depends on whether I run glxgears in fullscreen > mode or windowed mode. In windowed mode, the glxgears buffers will be > composited onto the root window, so there's no need for the buffers to > be scanout-capable. If I modify the dri3 loader to not pass those flags > I can make this work just fine. > > When I run glxgears in fullscreen mode, the modesetting driver ends up > wanting to display the glxgears buffer directly on screen, without > compositing it onto the root window. This ends up working if I leave out > the _USE_SHARE and _USE_SCANOUT flags, but I notice that the kernel then > complains about being unable to create a framebuffer, which in turn is > caused by the fact that those buffers are not exported (the Tegra Mesa > driver only exports/imports buffers that are meant for scanout, under > the assumption that those are the only ones that will ever need to be > used by KMS) and therefore Tegra DRM doesn't have a valid handle for > them. > > So I think an ideal solution would probably be for glxgears to somehow > pass better usage information when allocating buffers, but I suspect > that that's just not possible, or would be way too much work and require > additional protocol at the DRI level, so it's not really a good option > when all we want to fix is backwards-compatibility with pre-modifiers > userspace. > > Given that glamor also doesn't have any synchronization points, I don't > see how I can implement the de-tiling blit reliably. I was wondering if > it shouldn't be possible to flush the framebuffer resource (and perform > the blit) at presentation time, but I couldn't find a good entry point > to do this. > > One other solution that occurred to me was to reintroduce an old IOCTL > that we used to have in the Tegra DRM driver. That IOCTL was meant to > attach tiling meta data to an imported buffer and was basically a > simplified, driver-specific way of doing framebuffer modifiers. That's > a very ugly solution, but it would allow us to be backwards-compatible > with pre-modifiers userspace and even use an optimal path for rendering > and scanning out. The only prerequisite would be that the driver IOCTL > was implemented and that a recent enough Mesa was used to make use of > it. I don't like this very much because framebuffer modifiers are a much > more generic solution, but all of the other options above are pretty > much just as ugly. > > One other idea that I haven't explored yet is to be a little more clever > about the export/import dance that we do for buffers. Currently we > export/import at allocation time, and that seems to cause a bit of a > problem, like the lack of valid GEM handles for some buffers (such as in > the glxgears fullscreen use-case discussed above). I wonder if perhaps > deferring the export/import dance until the handles are actually > required may be a better way to do this. With such a solution, even if a > buffer is allocated for scanout, it won't actually be imported/exported > if the client ends up being composited onto the root window. Import and > export would be limited to buffers that truly are going to be used for > drmModeAddFB2(). I'll give that a shot and see if that gets me closer to > my goal. (back from vacations) I think right thing to do is *shrug*, please use modifiers. They're meant to solve these kind of problems for real, adding more hacks to paper over userspace not using modifiers doesn't seem like a good idea. Wrt dri3, since we do client-side allocations and don't have modifiers, we have to pessimistically assume we'll get scanned out. Modifiers and relevant protocol is fixing this again, but for tegra where we essentially can't get this right that leaves us in a very tough spot. So yeah I think "use modifiers" is the answer. -Daniel
Hi, On Tue, 1 Sep 2020 at 08:13, Daniel Vetter <daniel@ffwll.ch> wrote: > I think right thing to do is *shrug*, please use modifiers. They're meant > to solve these kind of problems for real, adding more hacks to paper over > userspace not using modifiers doesn't seem like a good idea. > > Wrt dri3, since we do client-side allocations and don't have modifiers, we > have to pessimistically assume we'll get scanned out. Modifiers and > relevant protocol is fixing this again, but for tegra where we essentially > can't get this right that leaves us in a very tough spot. DRI3 has had modifiers for a couple/few years now ... Cheers, Daniel
On Tue, Sep 1, 2020 at 9:13 AM Daniel Vetter <daniel@ffwll.ch> wrote: > > On Tue, Aug 18, 2020 at 04:37:51PM +0200, Thierry Reding wrote: > > On Fri, Aug 14, 2020 at 07:25:17PM +0200, Daniel Vetter wrote: > > > On Fri, Aug 14, 2020 at 7:17 PM Daniel Stone <daniel@fooishbar.org> wrote: > > > > > > > > Hi, > > > > > > > > On Fri, 14 Aug 2020 at 17:22, Thierry Reding <thierry.reding@gmail.com> wrote: > > > > > I suspect that the reason why this works in X but not in Wayland is > > > > > because X passes the right usage flags, whereas Weston may not. But I'll > > > > > have to investigate more in order to be sure. > > > > > > > > Weston allocates its own buffers for displaying the result of > > > > composition through GBM with USE_SCANOUT, which is definitely correct. > > > > > > > > Wayland clients (common to all compositors, in Mesa's > > > > src/egl/drivers/dri2/platform_wayland.c) allocate with USE_SHARED but > > > > _not_ USE_SCANOUT, which is correct in that they are guaranteed to be > > > > shared, but not guaranteed to be scanned out. The expectation is that > > > > non-scanout-compatible buffers would be rejected by gbm_bo_import if > > > > not drmModeAddFB2. > > > > > > > > One difference between Weston and all other compositors (GNOME Shell, > > > > KWin, Sway, etc) is that Weston uses KMS planes for composition when > > > > it can (i.e. when gbm_bo_import from dmabuf + drmModeAddFB2 from > > > > gbm_bo handle + atomic check succeed), but the other compositors only > > > > use the GPU. So if you have different assumptions about the layout of > > > > imported buffers between the GPU and KMS, that would explain a fair > > > > bit. > > > > > > Yeah non-modifiered multi-gpu (of any kind) is pretty much hopeless I > > > think. I guess the only option is if the tegra mesa driver forces > > > linear and an extra copy on everything that's USE_SHARED or > > > USE_SCANOUT. > > > > I ended up trying this, but this fails for the X case, unfortunately, > > because there doesn't seem to be a good synchronization point at which > > the de-tiling blit could be done. Weston and kmscube end up calling a > > gallium driver's ->flush_resource() implementation, but that never > > happens for X and glamor. > > > > But after looking into this some more, I don't think that's even the > > problem that we're facing here. The root of the problem that causes the > > glxgears crash that Karol was originally reporting is because we end up > > allocating the glxgears pixmaps using the dri3 loader from Mesa. But the > > dri3 loader will unconditionally pass both __DRI_IMAGE_USE_SHARE and > > __DRI_IMAGE_USE_SCANOUT, irrespective of whether the buffer will end up > > being scanned out directly or whether it will be composited onto the > > root window. > > > > What exactly happens depends on whether I run glxgears in fullscreen > > mode or windowed mode. In windowed mode, the glxgears buffers will be > > composited onto the root window, so there's no need for the buffers to > > be scanout-capable. If I modify the dri3 loader to not pass those flags > > I can make this work just fine. > > > > When I run glxgears in fullscreen mode, the modesetting driver ends up > > wanting to display the glxgears buffer directly on screen, without > > compositing it onto the root window. This ends up working if I leave out > > the _USE_SHARE and _USE_SCANOUT flags, but I notice that the kernel then > > complains about being unable to create a framebuffer, which in turn is > > caused by the fact that those buffers are not exported (the Tegra Mesa > > driver only exports/imports buffers that are meant for scanout, under > > the assumption that those are the only ones that will ever need to be > > used by KMS) and therefore Tegra DRM doesn't have a valid handle for > > them. > > > > So I think an ideal solution would probably be for glxgears to somehow > > pass better usage information when allocating buffers, but I suspect > > that that's just not possible, or would be way too much work and require > > additional protocol at the DRI level, so it's not really a good option > > when all we want to fix is backwards-compatibility with pre-modifiers > > userspace. > > > > Given that glamor also doesn't have any synchronization points, I don't > > see how I can implement the de-tiling blit reliably. I was wondering if > > it shouldn't be possible to flush the framebuffer resource (and perform > > the blit) at presentation time, but I couldn't find a good entry point > > to do this. > > > > One other solution that occurred to me was to reintroduce an old IOCTL > > that we used to have in the Tegra DRM driver. That IOCTL was meant to > > attach tiling meta data to an imported buffer and was basically a > > simplified, driver-specific way of doing framebuffer modifiers. That's > > a very ugly solution, but it would allow us to be backwards-compatible > > with pre-modifiers userspace and even use an optimal path for rendering > > and scanning out. The only prerequisite would be that the driver IOCTL > > was implemented and that a recent enough Mesa was used to make use of > > it. I don't like this very much because framebuffer modifiers are a much > > more generic solution, but all of the other options above are pretty > > much just as ugly. > > > > One other idea that I haven't explored yet is to be a little more clever > > about the export/import dance that we do for buffers. Currently we > > export/import at allocation time, and that seems to cause a bit of a > > problem, like the lack of valid GEM handles for some buffers (such as in > > the glxgears fullscreen use-case discussed above). I wonder if perhaps > > deferring the export/import dance until the handles are actually > > required may be a better way to do this. With such a solution, even if a > > buffer is allocated for scanout, it won't actually be imported/exported > > if the client ends up being composited onto the root window. Import and > > export would be limited to buffers that truly are going to be used for > > drmModeAddFB2(). I'll give that a shot and see if that gets me closer to > > my goal. > > (back from vacations) > > I think right thing to do is *shrug*, please use modifiers. They're meant > to solve these kind of problems for real, adding more hacks to paper over > userspace not using modifiers doesn't seem like a good idea. > > Wrt dri3, since we do client-side allocations and don't have modifiers, we > have to pessimistically assume we'll get scanned out. Modifiers and > relevant protocol is fixing this again, but for tegra where we essentially > can't get this right that leaves us in a very tough spot. > > So yeah I think "use modifiers" is the answer. > -Daniel Right.. the issue is just that we don't have any X release fixing it and some compositors (mutter) don't do the right thing by default either :/ I will ask around for mutter, but for X we really need to do a release I think, just I've heard about regressions we need to fix first. > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch >
On 9/1/20 3:59 AM, Karol Herbst wrote: > On Tue, Sep 1, 2020 at 9:13 AM Daniel Vetter <daniel@ffwll.ch> wrote: >> >> On Tue, Aug 18, 2020 at 04:37:51PM +0200, Thierry Reding wrote: >>> On Fri, Aug 14, 2020 at 07:25:17PM +0200, Daniel Vetter wrote: >>>> On Fri, Aug 14, 2020 at 7:17 PM Daniel Stone <daniel@fooishbar.org> wrote: >>>>> >>>>> Hi, >>>>> >>>>> On Fri, 14 Aug 2020 at 17:22, Thierry Reding <thierry.reding@gmail.com> wrote: >>>>>> I suspect that the reason why this works in X but not in Wayland is >>>>>> because X passes the right usage flags, whereas Weston may not. But I'll >>>>>> have to investigate more in order to be sure. >>>>> >>>>> Weston allocates its own buffers for displaying the result of >>>>> composition through GBM with USE_SCANOUT, which is definitely correct. >>>>> >>>>> Wayland clients (common to all compositors, in Mesa's >>>>> src/egl/drivers/dri2/platform_wayland.c) allocate with USE_SHARED but >>>>> _not_ USE_SCANOUT, which is correct in that they are guaranteed to be >>>>> shared, but not guaranteed to be scanned out. The expectation is that >>>>> non-scanout-compatible buffers would be rejected by gbm_bo_import if >>>>> not drmModeAddFB2. >>>>> >>>>> One difference between Weston and all other compositors (GNOME Shell, >>>>> KWin, Sway, etc) is that Weston uses KMS planes for composition when >>>>> it can (i.e. when gbm_bo_import from dmabuf + drmModeAddFB2 from >>>>> gbm_bo handle + atomic check succeed), but the other compositors only >>>>> use the GPU. So if you have different assumptions about the layout of >>>>> imported buffers between the GPU and KMS, that would explain a fair >>>>> bit. >>>> >>>> Yeah non-modifiered multi-gpu (of any kind) is pretty much hopeless I >>>> think. I guess the only option is if the tegra mesa driver forces >>>> linear and an extra copy on everything that's USE_SHARED or >>>> USE_SCANOUT. >>> >>> I ended up trying this, but this fails for the X case, unfortunately, >>> because there doesn't seem to be a good synchronization point at which >>> the de-tiling blit could be done. Weston and kmscube end up calling a >>> gallium driver's ->flush_resource() implementation, but that never >>> happens for X and glamor. >>> >>> But after looking into this some more, I don't think that's even the >>> problem that we're facing here. The root of the problem that causes the >>> glxgears crash that Karol was originally reporting is because we end up >>> allocating the glxgears pixmaps using the dri3 loader from Mesa. But the >>> dri3 loader will unconditionally pass both __DRI_IMAGE_USE_SHARE and >>> __DRI_IMAGE_USE_SCANOUT, irrespective of whether the buffer will end up >>> being scanned out directly or whether it will be composited onto the >>> root window. >>> >>> What exactly happens depends on whether I run glxgears in fullscreen >>> mode or windowed mode. In windowed mode, the glxgears buffers will be >>> composited onto the root window, so there's no need for the buffers to >>> be scanout-capable. If I modify the dri3 loader to not pass those flags >>> I can make this work just fine. >>> >>> When I run glxgears in fullscreen mode, the modesetting driver ends up >>> wanting to display the glxgears buffer directly on screen, without >>> compositing it onto the root window. This ends up working if I leave out >>> the _USE_SHARE and _USE_SCANOUT flags, but I notice that the kernel then >>> complains about being unable to create a framebuffer, which in turn is >>> caused by the fact that those buffers are not exported (the Tegra Mesa >>> driver only exports/imports buffers that are meant for scanout, under >>> the assumption that those are the only ones that will ever need to be >>> used by KMS) and therefore Tegra DRM doesn't have a valid handle for >>> them. >>> >>> So I think an ideal solution would probably be for glxgears to somehow >>> pass better usage information when allocating buffers, but I suspect >>> that that's just not possible, or would be way too much work and require >>> additional protocol at the DRI level, so it's not really a good option >>> when all we want to fix is backwards-compatibility with pre-modifiers >>> userspace. >>> >>> Given that glamor also doesn't have any synchronization points, I don't >>> see how I can implement the de-tiling blit reliably. I was wondering if >>> it shouldn't be possible to flush the framebuffer resource (and perform >>> the blit) at presentation time, but I couldn't find a good entry point >>> to do this. >>> >>> One other solution that occurred to me was to reintroduce an old IOCTL >>> that we used to have in the Tegra DRM driver. That IOCTL was meant to >>> attach tiling meta data to an imported buffer and was basically a >>> simplified, driver-specific way of doing framebuffer modifiers. That's >>> a very ugly solution, but it would allow us to be backwards-compatible >>> with pre-modifiers userspace and even use an optimal path for rendering >>> and scanning out. The only prerequisite would be that the driver IOCTL >>> was implemented and that a recent enough Mesa was used to make use of >>> it. I don't like this very much because framebuffer modifiers are a much >>> more generic solution, but all of the other options above are pretty >>> much just as ugly. >>> >>> One other idea that I haven't explored yet is to be a little more clever >>> about the export/import dance that we do for buffers. Currently we >>> export/import at allocation time, and that seems to cause a bit of a >>> problem, like the lack of valid GEM handles for some buffers (such as in >>> the glxgears fullscreen use-case discussed above). I wonder if perhaps >>> deferring the export/import dance until the handles are actually >>> required may be a better way to do this. With such a solution, even if a >>> buffer is allocated for scanout, it won't actually be imported/exported >>> if the client ends up being composited onto the root window. Import and >>> export would be limited to buffers that truly are going to be used for >>> drmModeAddFB2(). I'll give that a shot and see if that gets me closer to >>> my goal. >> >> (back from vacations) >> >> I think right thing to do is *shrug*, please use modifiers. They're meant >> to solve these kind of problems for real, adding more hacks to paper over >> userspace not using modifiers doesn't seem like a good idea. >> >> Wrt dri3, since we do client-side allocations and don't have modifiers, we >> have to pessimistically assume we'll get scanned out. Modifiers and >> relevant protocol is fixing this again, but for tegra where we essentially >> can't get this right that leaves us in a very tough spot. >> >> So yeah I think "use modifiers" is the answer. >> -Daniel > > Right.. the issue is just that we don't have any X release fixing it > and some compositors (mutter) don't do the right thing by default > either :/ I will ask around for mutter, but for X we really need to do > a release I think, just I've heard about regressions we need to fix > first. Yes, I have at least one more fix for modifiers that I need to write up on top of whatever's at ToT for X before someone cuts a release, as suggested (much) earlier in this thread. Thanks, -James >> -- >> Daniel Vetter >> Software Engineer, Intel Corporation >> http://blog.ffwll.ch >> >