Message ID | 20221013175650.1769399-1-jonathan.cavitt@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915/ttm: Fix access_memory null pointer exception | expand |
On 13/10/2022 18:56, Jonathan Cavitt wrote: > i915_ttm_to_gem can return a NULL pointer, which is > dereferenced in i915_ttm_access_memory without first > checking if it is NULL. Inspecting > i915_ttm_io_mem_reserve, it appears the correct > behavior in this case is to return -EINVAL. The GEM object has already been dereferenced before this point, if you look at the caller (vm_access_ttm). The NULL obj thing is to identify "ttm ghost objects", and I don't think a normal userpace object can suddenly become one (access_memory comes from ptrace). AFAIK ghost objects are just for temporarily hanging on to some memory/state, while the dma-resv is busy. In the places where ttm is the one giving us the object, then it might be possible to see these types of objects, since ttm could in theory pass one in (like during eviction). > > Fixes: 26b15eb0 ("drm/i915/ttm: implement access_memory") > Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> > Suggested-by: John C Harrison <John.C.Harrison@intel.com> > CC: Matthew Auld <matthew.auld@intel.com> > CC: Andrzej Hajda <andrzej.hajda@intel.com> > CC: Nirmoy Das <nirmoy.das@intel.com> > CC: Andi Shyti <andi.shyti@linux.intel.com> > --- > drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c > index d63f30efd631..b569624f2ed9 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c > @@ -704,11 +704,16 @@ static int i915_ttm_access_memory(struct ttm_buffer_object *bo, > int len, int write) > { > struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo); > - resource_size_t iomap = obj->mm.region->iomap.base - > - obj->mm.region->region.start; > + resource_size_t iomap; > unsigned long page = offset >> PAGE_SHIFT; > unsigned long bytes_left = len; > > + if (!obj) > + return -EINVAL; > + > + iomap = obj->mm.region->iomap.base - > + obj->mm.region->region.start; > + > /* > * TODO: For now just let it fail if the resource is non-mappable, > * otherwise we need to perform the memcpy from the gpu here, without
Hi Jonathan, On Thu, Oct 13, 2022 at 10:56:50AM -0700, Jonathan Cavitt wrote: > i915_ttm_to_gem can return a NULL pointer, which is > dereferenced in i915_ttm_access_memory without first > checking if it is NULL. Inspecting > i915_ttm_io_mem_reserve, it appears the correct > behavior in this case is to return -EINVAL. > > Fixes: 26b15eb0 ("drm/i915/ttm: implement access_memory") > Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> > Suggested-by: John C Harrison <John.C.Harrison@intel.com> > CC: Matthew Auld <matthew.auld@intel.com> > CC: Andrzej Hajda <andrzej.hajda@intel.com> > CC: Nirmoy Das <nirmoy.das@intel.com> > CC: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Thanks, Andi > --- > drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c > index d63f30efd631..b569624f2ed9 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c > @@ -704,11 +704,16 @@ static int i915_ttm_access_memory(struct ttm_buffer_object *bo, > int len, int write) > { > struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo); > - resource_size_t iomap = obj->mm.region->iomap.base - > - obj->mm.region->region.start; > + resource_size_t iomap; > unsigned long page = offset >> PAGE_SHIFT; > unsigned long bytes_left = len; > > + if (!obj) > + return -EINVAL; > + > + iomap = obj->mm.region->iomap.base - > + obj->mm.region->region.start; > + > /* > * TODO: For now just let it fail if the resource is non-mappable, > * otherwise we need to perform the memcpy from the gpu here, without > -- > 2.25.1
On Fri, Oct 14, 2022 at 09:39:52AM +0100, Matthew Auld wrote: > On 13/10/2022 18:56, Jonathan Cavitt wrote: > > i915_ttm_to_gem can return a NULL pointer, which is > > dereferenced in i915_ttm_access_memory without first > > checking if it is NULL. Inspecting > > i915_ttm_io_mem_reserve, it appears the correct > > behavior in this case is to return -EINVAL. > > The GEM object has already been dereferenced before this point, if you look > at the caller (vm_access_ttm). The NULL obj thing is to identify "ttm ghost > objects", and I don't think a normal userpace object can suddenly become one > (access_memory comes from ptrace). AFAIK ghost objects are just for > temporarily hanging on to some memory/state, while the dma-resv is busy. In > the places where ttm is the one giving us the object, then it might be > possible to see these types of objects, since ttm could in theory pass one > in (like during eviction). True that, but because from a code persepctive we can still receive NULL, I think the check is correct, perhaps we could: if (unlikely(!obj)) return -EINVAL; Andi > > Fixes: 26b15eb0 ("drm/i915/ttm: implement access_memory") > > Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> > > Suggested-by: John C Harrison <John.C.Harrison@intel.com> > > CC: Matthew Auld <matthew.auld@intel.com> > > CC: Andrzej Hajda <andrzej.hajda@intel.com> > > CC: Nirmoy Das <nirmoy.das@intel.com> > > CC: Andi Shyti <andi.shyti@linux.intel.com> > > --- > > drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 9 +++++++-- > > 1 file changed, 7 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c > > index d63f30efd631..b569624f2ed9 100644 > > --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c > > @@ -704,11 +704,16 @@ static int i915_ttm_access_memory(struct ttm_buffer_object *bo, > > int len, int write) > > { > > struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo); > > - resource_size_t iomap = obj->mm.region->iomap.base - > > - obj->mm.region->region.start; > > + resource_size_t iomap; > > unsigned long page = offset >> PAGE_SHIFT; > > unsigned long bytes_left = len; > > + if (!obj) > > + return -EINVAL; > > + > > + iomap = obj->mm.region->iomap.base - > > + obj->mm.region->region.start; > > + > > /* > > * TODO: For now just let it fail if the resource is non-mappable, > > * otherwise we need to perform the memcpy from the gpu here, without
On 13.10.2022 19:56, Jonathan Cavitt wrote: > i915_ttm_to_gem can return a NULL pointer, which is > dereferenced in i915_ttm_access_memory without first > checking if it is NULL. Inspecting > i915_ttm_io_mem_reserve, it appears the correct > behavior in this case is to return -EINVAL. > > Fixes: 26b15eb0 ("drm/i915/ttm: implement access_memory") > Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> > Suggested-by: John C Harrison <John.C.Harrison@intel.com> > CC: Matthew Auld <matthew.auld@intel.com> > CC: Andrzej Hajda <andrzej.hajda@intel.com> > CC: Nirmoy Das <nirmoy.das@intel.com> > CC: Andi Shyti <andi.shyti@linux.intel.com> > --- > drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c > index d63f30efd631..b569624f2ed9 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c > @@ -704,11 +704,16 @@ static int i915_ttm_access_memory(struct ttm_buffer_object *bo, > int len, int write) > { > struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo); > - resource_size_t iomap = obj->mm.region->iomap.base - > - obj->mm.region->region.start; > + resource_size_t iomap; > unsigned long page = offset >> PAGE_SHIFT; > unsigned long bytes_left = len; > > + if (!obj) > + return -EINVAL; > + > + iomap = obj->mm.region->iomap.base - > + obj->mm.region->region.start; > + There are 4 occurrences of this subtraction in the code, enough for helper :) ? Anyway: Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Regards Andrzej > /* > * TODO: For now just let it fail if the resource is non-mappable, > * otherwise we need to perform the memcpy from the gpu here, without
Hi Matt On 10/14/2022 10:39 AM, Matthew Auld wrote: > On 13/10/2022 18:56, Jonathan Cavitt wrote: >> i915_ttm_to_gem can return a NULL pointer, which is >> dereferenced in i915_ttm_access_memory without first >> checking if it is NULL. Inspecting >> i915_ttm_io_mem_reserve, it appears the correct >> behavior in this case is to return -EINVAL. > > The GEM object has already been dereferenced before this point, if you > look at the caller (vm_access_ttm). The NULL obj thing is to identify > "ttm ghost objects", and I don't think a normal userpace object can > suddenly become one (access_memory comes from ptrace). AFAIK ghost > objects are just for temporarily hanging on to some memory/state, > while the dma-resv is busy. In the places where ttm is the one giving > us the object, then it might be possible to see these types of > objects, since ttm could in theory pass one in (like during eviction). Yes, we should not hit this. Thanks for the nice "ttm ghost objects" reminder :) I think we can still have this check to avoid code analysis tool warnings, what do you think ? Thanks, Nirmoy > >> >> Fixes: 26b15eb0 ("drm/i915/ttm: implement access_memory") >> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> >> Suggested-by: John C Harrison <John.C.Harrison@intel.com> >> CC: Matthew Auld <matthew.auld@intel.com> >> CC: Andrzej Hajda <andrzej.hajda@intel.com> >> CC: Nirmoy Das <nirmoy.das@intel.com> >> CC: Andi Shyti <andi.shyti@linux.intel.com> >> --- >> drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 9 +++++++-- >> 1 file changed, 7 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >> b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >> index d63f30efd631..b569624f2ed9 100644 >> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >> @@ -704,11 +704,16 @@ static int i915_ttm_access_memory(struct >> ttm_buffer_object *bo, >> int len, int write) >> { >> struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo); >> - resource_size_t iomap = obj->mm.region->iomap.base - >> - obj->mm.region->region.start; >> + resource_size_t iomap; >> unsigned long page = offset >> PAGE_SHIFT; >> unsigned long bytes_left = len; >> + if (!obj) >> + return -EINVAL; >> + >> + iomap = obj->mm.region->iomap.base - >> + obj->mm.region->region.start; >> + >> /* >> * TODO: For now just let it fail if the resource is non-mappable, >> * otherwise we need to perform the memcpy from the gpu here, >> without
On 14/10/2022 09:56, Andi Shyti wrote: > On Fri, Oct 14, 2022 at 09:39:52AM +0100, Matthew Auld wrote: >> On 13/10/2022 18:56, Jonathan Cavitt wrote: >>> i915_ttm_to_gem can return a NULL pointer, which is >>> dereferenced in i915_ttm_access_memory without first >>> checking if it is NULL. Inspecting >>> i915_ttm_io_mem_reserve, it appears the correct >>> behavior in this case is to return -EINVAL. >> >> The GEM object has already been dereferenced before this point, if you look >> at the caller (vm_access_ttm). The NULL obj thing is to identify "ttm ghost >> objects", and I don't think a normal userpace object can suddenly become one >> (access_memory comes from ptrace). AFAIK ghost objects are just for >> temporarily hanging on to some memory/state, while the dma-resv is busy. In >> the places where ttm is the one giving us the object, then it might be >> possible to see these types of objects, since ttm could in theory pass one >> in (like during eviction). > > True that, but because from a code persepctive we can still receive > NULL, I think the check is correct, perhaps we could: > > if (unlikely(!obj)) > return -EINVAL; Hmm, so that will dereference some pointer, and then later check if it is NULL here? Or do you mean to move this into vm_access()? If we are given a "ghost object" for ptrace this would likely mean we have a very nasty bug somewhere (unless I'm misunderstanding something), and so returning a normal user error here doesn't seem right to me (maybe this just hides the issue)? Letting it crash seems fine to me tbh. It also makes the code harder to understand IMO, because looking at this it now suggests that it is somehow possible to have a "ghost object" here. Also there are a fair few places calling i915_ttm_to_gem() which already don't check for NULL, since it should be impossible, like it should be here. > > Andi > >>> Fixes: 26b15eb0 ("drm/i915/ttm: implement access_memory") >>> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> >>> Suggested-by: John C Harrison <John.C.Harrison@intel.com> >>> CC: Matthew Auld <matthew.auld@intel.com> >>> CC: Andrzej Hajda <andrzej.hajda@intel.com> >>> CC: Nirmoy Das <nirmoy.das@intel.com> >>> CC: Andi Shyti <andi.shyti@linux.intel.com> >>> --- >>> drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 9 +++++++-- >>> 1 file changed, 7 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >>> index d63f30efd631..b569624f2ed9 100644 >>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >>> @@ -704,11 +704,16 @@ static int i915_ttm_access_memory(struct ttm_buffer_object *bo, >>> int len, int write) >>> { >>> struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo); >>> - resource_size_t iomap = obj->mm.region->iomap.base - >>> - obj->mm.region->region.start; >>> + resource_size_t iomap; >>> unsigned long page = offset >> PAGE_SHIFT; >>> unsigned long bytes_left = len; >>> + if (!obj) >>> + return -EINVAL; >>> + >>> + iomap = obj->mm.region->iomap.base - >>> + obj->mm.region->region.start; >>> + >>> /* >>> * TODO: For now just let it fail if the resource is non-mappable, >>> * otherwise we need to perform the memcpy from the gpu here, without
Hi, A couple nit picks for the benefit of coding style. On 13/10/2022 18:56, Jonathan Cavitt wrote: > i915_ttm_to_gem can return a NULL pointer, which is > dereferenced in i915_ttm_access_memory without first > checking if it is NULL. Inspecting > i915_ttm_io_mem_reserve, it appears the correct > behavior in this case is to return -EINVAL. Too narrow wrap - see kernel's submitting-patches.rst: - The body of the explanation, line wrapped at 75 columns, which will be copied to the permanent changelog to describe this patch. > > Fixes: 26b15eb0 ("drm/i915/ttm: implement access_memory") > Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> > Suggested-by: John C Harrison <John.C.Harrison@intel.com> > CC: Matthew Auld <matthew.auld@intel.com> > CC: Andrzej Hajda <andrzej.hajda@intel.com> > CC: Nirmoy Das <nirmoy.das@intel.com> > CC: Andi Shyti <andi.shyti@linux.intel.com> > --- > drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c > index d63f30efd631..b569624f2ed9 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c > @@ -704,11 +704,16 @@ static int i915_ttm_access_memory(struct ttm_buffer_object *bo, > int len, int write) > { > struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo); > - resource_size_t iomap = obj->mm.region->iomap.base - > - obj->mm.region->region.start; > + resource_size_t iomap; > unsigned long page = offset >> PAGE_SHIFT; > unsigned long bytes_left = len; We tend to re-order the declarations from long to narrow, where not too cumbersome, just for that extra readability. > > + if (!obj) > + return -EINVAL; Someone perhaps can update the vfunc kerneldoc since it appears to be overly strict in allowed return codes. include/drm/ttm/ttm_device.h: /** * Read/write memory buffers for ptrace access * * @bo: the BO to access * @offset: the offset from the start of the BO * @buf: pointer to source/destination buffer * @len: number of bytes to copy * @write: whether to read (0) from or write (non-0) to BO * * If successful, this function should return the number of * bytes copied, -EIO otherwise. If the number of bytes * returned is < len, the function may be called again with * the remainder of the buffer to copy. */ int (*access_memory)(struct ttm_buffer_object *bo, unsigned long offset, void *buf, int len, int write); Regards, Tvrtko > + > + iomap = obj->mm.region->iomap.base - > + obj->mm.region->region.start; > + > /* > * TODO: For now just let it fail if the resource is non-mappable, > * otherwise we need to perform the memcpy from the gpu here, without
On 14/10/2022 10:27, Das, Nirmoy wrote: > Hi Matt > > On 10/14/2022 10:39 AM, Matthew Auld wrote: >> On 13/10/2022 18:56, Jonathan Cavitt wrote: >>> i915_ttm_to_gem can return a NULL pointer, which is >>> dereferenced in i915_ttm_access_memory without first >>> checking if it is NULL. Inspecting >>> i915_ttm_io_mem_reserve, it appears the correct >>> behavior in this case is to return -EINVAL. >> >> The GEM object has already been dereferenced before this point, if you >> look at the caller (vm_access_ttm). The NULL obj thing is to identify >> "ttm ghost objects", and I don't think a normal userpace object can >> suddenly become one (access_memory comes from ptrace). AFAIK ghost >> objects are just for temporarily hanging on to some memory/state, >> while the dma-resv is busy. In the places where ttm is the one giving >> us the object, then it might be possible to see these types of >> objects, since ttm could in theory pass one in (like during eviction). > > > Yes, we should not hit this. Thanks for the nice "ttm ghost objects" > reminder :) > > > I think we can still have this check to avoid code analysis tool > warnings, what do you think ? IMHO I think it just makes it harder to understand the code, since conceptually it should be impossible, given how "ghost objects" actually work. Adding such a check gives the impression that it is somehow now possible to be given one here (like with eviction etc). AFAIK just letting it crash is fine, instead of littering the code with NULL checks for stuff that is never meant to be NULL and would be a driver bug. Also there are a bunch of other places not checking that i915_ttm_to_gem() returns NULL, so why just here? Did the code analysis tool find something? Also why doesn't it complain about vm_access_ttm(), which is the one actually calling access_memory() and is itself also doing i915_ttm_to_gem() and also not checking for NULL? > > > Thanks, > > Nirmoy > >> >>> >>> Fixes: 26b15eb0 ("drm/i915/ttm: implement access_memory") >>> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> >>> Suggested-by: John C Harrison <John.C.Harrison@intel.com> >>> CC: Matthew Auld <matthew.auld@intel.com> >>> CC: Andrzej Hajda <andrzej.hajda@intel.com> >>> CC: Nirmoy Das <nirmoy.das@intel.com> >>> CC: Andi Shyti <andi.shyti@linux.intel.com> >>> --- >>> drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 9 +++++++-- >>> 1 file changed, 7 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >>> index d63f30efd631..b569624f2ed9 100644 >>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >>> @@ -704,11 +704,16 @@ static int i915_ttm_access_memory(struct >>> ttm_buffer_object *bo, >>> int len, int write) >>> { >>> struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo); >>> - resource_size_t iomap = obj->mm.region->iomap.base - >>> - obj->mm.region->region.start; >>> + resource_size_t iomap; >>> unsigned long page = offset >> PAGE_SHIFT; >>> unsigned long bytes_left = len; >>> + if (!obj) >>> + return -EINVAL; >>> + >>> + iomap = obj->mm.region->iomap.base - >>> + obj->mm.region->region.start; >>> + >>> /* >>> * TODO: For now just let it fail if the resource is non-mappable, >>> * otherwise we need to perform the memcpy from the gpu here, >>> without
Hi Matt, On 10/14/2022 12:13 PM, Matthew Auld wrote: > On 14/10/2022 10:27, Das, Nirmoy wrote: >> Hi Matt >> >> On 10/14/2022 10:39 AM, Matthew Auld wrote: >>> On 13/10/2022 18:56, Jonathan Cavitt wrote: >>>> i915_ttm_to_gem can return a NULL pointer, which is >>>> dereferenced in i915_ttm_access_memory without first >>>> checking if it is NULL. Inspecting >>>> i915_ttm_io_mem_reserve, it appears the correct >>>> behavior in this case is to return -EINVAL. >>> >>> The GEM object has already been dereferenced before this point, if >>> you look at the caller (vm_access_ttm). The NULL obj thing is to >>> identify "ttm ghost objects", and I don't think a normal userpace >>> object can suddenly become one (access_memory comes from ptrace). >>> AFAIK ghost objects are just for temporarily hanging on to some >>> memory/state, while the dma-resv is busy. In the places where ttm is >>> the one giving us the object, then it might be possible to see these >>> types of objects, since ttm could in theory pass one in (like during >>> eviction). >> >> >> Yes, we should not hit this. Thanks for the nice "ttm ghost objects" >> reminder :) >> >> >> I think we can still have this check to avoid code analysis tool >> warnings, what do you think ? > > IMHO I think it just makes it harder to understand the code, since > conceptually it should be impossible, given how "ghost objects" > actually work. Adding such a check gives the impression that it is > somehow now possible to be given one here (like with eviction etc). > AFAIK just letting it crash is fine, instead of littering the code > with NULL checks for stuff that is never meant to be NULL and would be > a driver bug. Also there are a bunch of other places not checking that > i915_ttm_to_gem() returns NULL, so why just here? This is tricky because some place we might receive NULL and some other places we might not(from i915_ttm_to_gem). I also don't like the idea of sprinkling NULL check everywhere. I think the issue is i915_ttm_to_gem returns NULL for non-i915 BO. We should move "if (bo->destroy != i915_ttm_bo_destroy)" check to the respective function where we expect ghost object. That should make the static code analyzer happy and also makes it very clear which function expect ghost objects. > Did the code analysis tool find something? Also why doesn't it > complain about vm_access_ttm(), which is the one actually calling > access_memory() and is itself also doing i915_ttm_to_gem() and also > not checking for NULL? Yes, I think the patch idea came from our static code analyzer warning but I can't seem to open the URL. I am also not sure why it doesn't complain for other cases. Thanks, Nirmoy > >> >> >> Thanks, >> >> Nirmoy >> >>> >>>> >>>> Fixes: 26b15eb0 ("drm/i915/ttm: implement access_memory") >>>> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> >>>> Suggested-by: John C Harrison <John.C.Harrison@intel.com> >>>> CC: Matthew Auld <matthew.auld@intel.com> >>>> CC: Andrzej Hajda <andrzej.hajda@intel.com> >>>> CC: Nirmoy Das <nirmoy.das@intel.com> >>>> CC: Andi Shyti <andi.shyti@linux.intel.com> >>>> --- >>>> drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 9 +++++++-- >>>> 1 file changed, 7 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >>>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >>>> index d63f30efd631..b569624f2ed9 100644 >>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >>>> @@ -704,11 +704,16 @@ static int i915_ttm_access_memory(struct >>>> ttm_buffer_object *bo, >>>> int len, int write) >>>> { >>>> struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo); >>>> - resource_size_t iomap = obj->mm.region->iomap.base - >>>> - obj->mm.region->region.start; >>>> + resource_size_t iomap; >>>> unsigned long page = offset >> PAGE_SHIFT; >>>> unsigned long bytes_left = len; >>>> + if (!obj) >>>> + return -EINVAL; >>>> + >>>> + iomap = obj->mm.region->iomap.base - >>>> + obj->mm.region->region.start; >>>> + >>>> /* >>>> * TODO: For now just let it fail if the resource is >>>> non-mappable, >>>> * otherwise we need to perform the memcpy from the gpu here, >>>> without
On 14/10/2022 11:38, Das, Nirmoy wrote: > Hi Matt, > > On 10/14/2022 12:13 PM, Matthew Auld wrote: >> On 14/10/2022 10:27, Das, Nirmoy wrote: >>> Hi Matt >>> >>> On 10/14/2022 10:39 AM, Matthew Auld wrote: >>>> On 13/10/2022 18:56, Jonathan Cavitt wrote: >>>>> i915_ttm_to_gem can return a NULL pointer, which is >>>>> dereferenced in i915_ttm_access_memory without first >>>>> checking if it is NULL. Inspecting >>>>> i915_ttm_io_mem_reserve, it appears the correct >>>>> behavior in this case is to return -EINVAL. >>>> >>>> The GEM object has already been dereferenced before this point, if >>>> you look at the caller (vm_access_ttm). The NULL obj thing is to >>>> identify "ttm ghost objects", and I don't think a normal userpace >>>> object can suddenly become one (access_memory comes from ptrace). >>>> AFAIK ghost objects are just for temporarily hanging on to some >>>> memory/state, while the dma-resv is busy. In the places where ttm is >>>> the one giving us the object, then it might be possible to see these >>>> types of objects, since ttm could in theory pass one in (like during >>>> eviction). >>> >>> >>> Yes, we should not hit this. Thanks for the nice "ttm ghost objects" >>> reminder :) >>> >>> >>> I think we can still have this check to avoid code analysis tool >>> warnings, what do you think ? >> >> IMHO I think it just makes it harder to understand the code, since >> conceptually it should be impossible, given how "ghost objects" >> actually work. Adding such a check gives the impression that it is >> somehow now possible to be given one here (like with eviction etc). >> AFAIK just letting it crash is fine, instead of littering the code >> with NULL checks for stuff that is never meant to be NULL and would be >> a driver bug. Also there are a bunch of other places not checking that >> i915_ttm_to_gem() returns NULL, so why just here? > > This is tricky because some place we might receive NULL and some other > places we might not(from i915_ttm_to_gem). I also don't like the idea of > sprinkling NULL check everywhere. > > I think the issue is i915_ttm_to_gem returns NULL for non-i915 BO. We > should move "if (bo->destroy != i915_ttm_bo_destroy)" check to the > respective function where we > > expect ghost object. That should make the static code analyzer happy and > also makes it very clear which function expect ghost objects. Yeah, that sounds like a really nice idea to me. amdgpu looks to have something like amdgpu_bo_is_amdgpu_bo() for the spots that might be "ghost objects". Maybe we can add something like i915_ttm_is_ghost_bo() or similar for our needs. > > >> Did the code analysis tool find something? Also why doesn't it >> complain about vm_access_ttm(), which is the one actually calling >> access_memory() and is itself also doing i915_ttm_to_gem() and also >> not checking for NULL? > > > Yes, I think the patch idea came from our static code analyzer warning > but I can't seem to open the URL. I am also not sure why it doesn't > complain for other cases. > > > Thanks, > > Nirmoy > >> >>> >>> >>> Thanks, >>> >>> Nirmoy >>> >>>> >>>>> >>>>> Fixes: 26b15eb0 ("drm/i915/ttm: implement access_memory") >>>>> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> >>>>> Suggested-by: John C Harrison <John.C.Harrison@intel.com> >>>>> CC: Matthew Auld <matthew.auld@intel.com> >>>>> CC: Andrzej Hajda <andrzej.hajda@intel.com> >>>>> CC: Nirmoy Das <nirmoy.das@intel.com> >>>>> CC: Andi Shyti <andi.shyti@linux.intel.com> >>>>> --- >>>>> drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 9 +++++++-- >>>>> 1 file changed, 7 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >>>>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >>>>> index d63f30efd631..b569624f2ed9 100644 >>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >>>>> @@ -704,11 +704,16 @@ static int i915_ttm_access_memory(struct >>>>> ttm_buffer_object *bo, >>>>> int len, int write) >>>>> { >>>>> struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo); >>>>> - resource_size_t iomap = obj->mm.region->iomap.base - >>>>> - obj->mm.region->region.start; >>>>> + resource_size_t iomap; >>>>> unsigned long page = offset >> PAGE_SHIFT; >>>>> unsigned long bytes_left = len; >>>>> + if (!obj) >>>>> + return -EINVAL; >>>>> + >>>>> + iomap = obj->mm.region->iomap.base - >>>>> + obj->mm.region->region.start; >>>>> + >>>>> /* >>>>> * TODO: For now just let it fail if the resource is >>>>> non-mappable, >>>>> * otherwise we need to perform the memcpy from the gpu here, >>>>> without
On 10/14/2022 12:52 PM, Matthew Auld wrote: > On 14/10/2022 11:38, Das, Nirmoy wrote: >> Hi Matt, >> >> On 10/14/2022 12:13 PM, Matthew Auld wrote: >>> On 14/10/2022 10:27, Das, Nirmoy wrote: >>>> Hi Matt >>>> >>>> On 10/14/2022 10:39 AM, Matthew Auld wrote: >>>>> On 13/10/2022 18:56, Jonathan Cavitt wrote: >>>>>> i915_ttm_to_gem can return a NULL pointer, which is >>>>>> dereferenced in i915_ttm_access_memory without first >>>>>> checking if it is NULL. Inspecting >>>>>> i915_ttm_io_mem_reserve, it appears the correct >>>>>> behavior in this case is to return -EINVAL. >>>>> >>>>> The GEM object has already been dereferenced before this point, if >>>>> you look at the caller (vm_access_ttm). The NULL obj thing is to >>>>> identify "ttm ghost objects", and I don't think a normal userpace >>>>> object can suddenly become one (access_memory comes from ptrace). >>>>> AFAIK ghost objects are just for temporarily hanging on to some >>>>> memory/state, while the dma-resv is busy. In the places where ttm >>>>> is the one giving us the object, then it might be possible to see >>>>> these types of objects, since ttm could in theory pass one in >>>>> (like during eviction). >>>> >>>> >>>> Yes, we should not hit this. Thanks for the nice "ttm ghost >>>> objects" reminder :) >>>> >>>> >>>> I think we can still have this check to avoid code analysis tool >>>> warnings, what do you think ? >>> >>> IMHO I think it just makes it harder to understand the code, since >>> conceptually it should be impossible, given how "ghost objects" >>> actually work. Adding such a check gives the impression that it is >>> somehow now possible to be given one here (like with eviction etc). >>> AFAIK just letting it crash is fine, instead of littering the code >>> with NULL checks for stuff that is never meant to be NULL and would >>> be a driver bug. Also there are a bunch of other places not checking >>> that i915_ttm_to_gem() returns NULL, so why just here? >> >> This is tricky because some place we might receive NULL and some >> other places we might not(from i915_ttm_to_gem). I also don't like >> the idea of sprinkling NULL check everywhere. >> >> I think the issue is i915_ttm_to_gem returns NULL for non-i915 BO. >> We should move "if (bo->destroy != i915_ttm_bo_destroy)" check to the >> respective function where we >> >> expect ghost object. That should make the static code analyzer happy >> and also makes it very clear which function expect ghost objects. > > Yeah, that sounds like a really nice idea to me. amdgpu looks to have > something like amdgpu_bo_is_amdgpu_bo() for the spots that might be > "ghost objects". Maybe we can add something like > i915_ttm_is_ghost_bo() or similar for our needs. I will prepare patch for that then. Thanks, Nirmoy > >> >> >>> Did the code analysis tool find something? Also why doesn't it >>> complain about vm_access_ttm(), which is the one actually calling >>> access_memory() and is itself also doing i915_ttm_to_gem() and also >>> not checking for NULL? >> >> >> Yes, I think the patch idea came from our static code analyzer >> warning but I can't seem to open the URL. I am also not sure why it >> doesn't complain for other cases. >> >> >> Thanks, >> >> Nirmoy >> >>> >>>> >>>> >>>> Thanks, >>>> >>>> Nirmoy >>>> >>>>> >>>>>> >>>>>> Fixes: 26b15eb0 ("drm/i915/ttm: implement access_memory") >>>>>> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> >>>>>> Suggested-by: John C Harrison <John.C.Harrison@intel.com> >>>>>> CC: Matthew Auld <matthew.auld@intel.com> >>>>>> CC: Andrzej Hajda <andrzej.hajda@intel.com> >>>>>> CC: Nirmoy Das <nirmoy.das@intel.com> >>>>>> CC: Andi Shyti <andi.shyti@linux.intel.com> >>>>>> --- >>>>>> drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 9 +++++++-- >>>>>> 1 file changed, 7 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >>>>>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >>>>>> index d63f30efd631..b569624f2ed9 100644 >>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c >>>>>> @@ -704,11 +704,16 @@ static int i915_ttm_access_memory(struct >>>>>> ttm_buffer_object *bo, >>>>>> int len, int write) >>>>>> { >>>>>> struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo); >>>>>> - resource_size_t iomap = obj->mm.region->iomap.base - >>>>>> - obj->mm.region->region.start; >>>>>> + resource_size_t iomap; >>>>>> unsigned long page = offset >> PAGE_SHIFT; >>>>>> unsigned long bytes_left = len; >>>>>> + if (!obj) >>>>>> + return -EINVAL; >>>>>> + >>>>>> + iomap = obj->mm.region->iomap.base - >>>>>> + obj->mm.region->region.start; >>>>>> + >>>>>> /* >>>>>> * TODO: For now just let it fail if the resource is >>>>>> non-mappable, >>>>>> * otherwise we need to perform the memcpy from the gpu >>>>>> here, without
Hi Matt, On Fri, Oct 14, 2022 at 10:44:11AM +0100, Matthew Auld wrote: > On 14/10/2022 09:56, Andi Shyti wrote: > > On Fri, Oct 14, 2022 at 09:39:52AM +0100, Matthew Auld wrote: > > > On 13/10/2022 18:56, Jonathan Cavitt wrote: > > > > i915_ttm_to_gem can return a NULL pointer, which is > > > > dereferenced in i915_ttm_access_memory without first > > > > checking if it is NULL. Inspecting > > > > i915_ttm_io_mem_reserve, it appears the correct > > > > behavior in this case is to return -EINVAL. > > > > > > The GEM object has already been dereferenced before this point, if you look > > > at the caller (vm_access_ttm). The NULL obj thing is to identify "ttm ghost > > > objects", and I don't think a normal userpace object can suddenly become one > > > (access_memory comes from ptrace). AFAIK ghost objects are just for > > > temporarily hanging on to some memory/state, while the dma-resv is busy. In > > > the places where ttm is the one giving us the object, then it might be > > > possible to see these types of objects, since ttm could in theory pass one > > > in (like during eviction). > > > > True that, but because from a code persepctive we can still receive > > NULL, I think the check is correct, perhaps we could: > > > > if (unlikely(!obj)) > > return -EINVAL; > > Hmm, so that will dereference some pointer, and then later check if it is > NULL here? Or do you mean to move this into vm_access()? If we are given a > "ghost object" for ptrace this would likely mean we have a very nasty bug > somewhere (unless I'm misunderstanding something), and so returning a normal > user error here doesn't seem right to me (maybe this just hides the issue)? > Letting it crash seems fine to me tbh. It also makes the code harder to > understand IMO, because looking at this it now suggests that it is somehow > possible to have a "ghost object" here. Also there are a fair few places > calling i915_ttm_to_gem() which already don't check for NULL, since it > should be impossible, like it should be here. By just analyzing the code, getting NULL is not impossible. In that case even a GEM_BUG_ON would have worked. But the NULL pointer, as it is, needs to be checked. Anyway, I see that an agreement has been reached with Nirmoy, so that it doesn't matter anymore :) Andi
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c index d63f30efd631..b569624f2ed9 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c @@ -704,11 +704,16 @@ static int i915_ttm_access_memory(struct ttm_buffer_object *bo, int len, int write) { struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo); - resource_size_t iomap = obj->mm.region->iomap.base - - obj->mm.region->region.start; + resource_size_t iomap; unsigned long page = offset >> PAGE_SHIFT; unsigned long bytes_left = len; + if (!obj) + return -EINVAL; + + iomap = obj->mm.region->iomap.base - + obj->mm.region->region.start; + /* * TODO: For now just let it fail if the resource is non-mappable, * otherwise we need to perform the memcpy from the gpu here, without
i915_ttm_to_gem can return a NULL pointer, which is dereferenced in i915_ttm_access_memory without first checking if it is NULL. Inspecting i915_ttm_io_mem_reserve, it appears the correct behavior in this case is to return -EINVAL. Fixes: 26b15eb0 ("drm/i915/ttm: implement access_memory") Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Suggested-by: John C Harrison <John.C.Harrison@intel.com> CC: Matthew Auld <matthew.auld@intel.com> CC: Andrzej Hajda <andrzej.hajda@intel.com> CC: Nirmoy Das <nirmoy.das@intel.com> CC: Andi Shyti <andi.shyti@linux.intel.com> --- drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)