diff mbox

ram/gf100-: error out if a ridiculous amount of vram is detected

Message ID 1432152067-32104-1-git-send-email-imirkin@alum.mit.edu (mailing list archive)
State New, archived
Headers show

Commit Message

Ilia Mirkin May 20, 2015, 8:01 p.m. UTC
Some newer chips have trouble coming up, and we get bad MMIO reads from
them, like 0xbadf100. This ends up translating into crazy amounts of
VRAM, which destroys all sorts of other logic down the line. Instead,
fail device init.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: stable@kernel.org
---
 drm/nouveau/nvkm/subdev/fb/ramgf100.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Tobias Klausmann May 20, 2015, 9:35 p.m. UTC | #1
Any idea on how to solve the problem. other than just reporting it?

But for now this adds a helpful error message... you may add my R-b.

On 20.05.2015 22:01, Ilia Mirkin wrote:
> Some newer chips have trouble coming up, and we get bad MMIO reads from
> them, like 0xbadf100. This ends up translating into crazy amounts of
> VRAM, which destroys all sorts of other logic down the line. Instead,
> fail device init.
>
> Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
> Cc: stable@kernel.org
> ---
>   drm/nouveau/nvkm/subdev/fb/ramgf100.c | 6 ++++++
>   1 file changed, 6 insertions(+)
>
> diff --git a/drm/nouveau/nvkm/subdev/fb/ramgf100.c b/drm/nouveau/nvkm/subdev/fb/ramgf100.c
> index de9f395..9d4d196 100644
> --- a/drm/nouveau/nvkm/subdev/fb/ramgf100.c
> +++ b/drm/nouveau/nvkm/subdev/fb/ramgf100.c
> @@ -545,6 +545,12 @@ gf100_ram_create_(struct nvkm_object *parent, struct nvkm_object *engine,
>   		}
>   	}
>   
> +	/* if over 1TB of VRAM is reported, something went very wrong, bail */
> +	if (ram->size > (1ULL << 40)) {
> +		nv_error(pfb, "invalid vram size: %llx\n", ram->size);
> +		return -EINVAL;
> +	}
> +
>   	/* if all controllers have the same amount attached, there's no holes */
>   	if (uniform) {
>   		offset = rsvd_head;
Ilia Mirkin May 20, 2015, 9:47 p.m. UTC | #2
Someone will have to trudge through a mmiotrace and figure out what
magic bit we need to set in order to bring it out of deep sleep. Or
perhaps NVIDIA will graciously tell us, which they eventually did for
GK104/GK106 (but their instructions appear to be insufficient for at
least some GK106's).

But I've seen these errors every so often on various cards... we stick
things at the end of VRAM, which causes no end of confusion when we
think that it's a few PB out :)

On Wed, May 20, 2015 at 5:35 PM, Tobias Klausmann
<tobias.johannes.klausmann@mni.thm.de> wrote:
> Any idea on how to solve the problem. other than just reporting it?
>
> But for now this adds a helpful error message... you may add my R-b.
>
>
> On 20.05.2015 22:01, Ilia Mirkin wrote:
>>
>> Some newer chips have trouble coming up, and we get bad MMIO reads from
>> them, like 0xbadf100. This ends up translating into crazy amounts of
>> VRAM, which destroys all sorts of other logic down the line. Instead,
>> fail device init.
>>
>> Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
>> Cc: stable@kernel.org
>> ---
>>   drm/nouveau/nvkm/subdev/fb/ramgf100.c | 6 ++++++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/drm/nouveau/nvkm/subdev/fb/ramgf100.c
>> b/drm/nouveau/nvkm/subdev/fb/ramgf100.c
>> index de9f395..9d4d196 100644
>> --- a/drm/nouveau/nvkm/subdev/fb/ramgf100.c
>> +++ b/drm/nouveau/nvkm/subdev/fb/ramgf100.c
>> @@ -545,6 +545,12 @@ gf100_ram_create_(struct nvkm_object *parent, struct
>> nvkm_object *engine,
>>                 }
>>         }
>>   +     /* if over 1TB of VRAM is reported, something went very wrong,
>> bail */
>> +       if (ram->size > (1ULL << 40)) {
>> +               nv_error(pfb, "invalid vram size: %llx\n", ram->size);
>> +               return -EINVAL;
>> +       }
>> +
>>         /* if all controllers have the same amount attached, there's no
>> holes */
>>         if (uniform) {
>>                 offset = rsvd_head;
>
>
Ben Skeggs May 21, 2015, 4:45 a.m. UTC | #3
On 21 May 2015 at 06:01, Ilia Mirkin <imirkin@alum.mit.edu> wrote:
> Some newer chips have trouble coming up, and we get bad MMIO reads from
> them, like 0xbadf100. This ends up translating into crazy amounts of
> VRAM, which destroys all sorts of other logic down the line. Instead,
> fail device init.
Hrm, I'm not sure what I think of doing something like this.  Where do
we draw the line at validating stuff we read from GPU registers?
Either way, we still have a bug, so I'm not sure what we gain from
working around it like this.

Ben.

>
> Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
> Cc: stable@kernel.org
> ---
>  drm/nouveau/nvkm/subdev/fb/ramgf100.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/drm/nouveau/nvkm/subdev/fb/ramgf100.c b/drm/nouveau/nvkm/subdev/fb/ramgf100.c
> index de9f395..9d4d196 100644
> --- a/drm/nouveau/nvkm/subdev/fb/ramgf100.c
> +++ b/drm/nouveau/nvkm/subdev/fb/ramgf100.c
> @@ -545,6 +545,12 @@ gf100_ram_create_(struct nvkm_object *parent, struct nvkm_object *engine,
>                 }
>         }
>
> +       /* if over 1TB of VRAM is reported, something went very wrong, bail */
> +       if (ram->size > (1ULL << 40)) {
> +               nv_error(pfb, "invalid vram size: %llx\n", ram->size);
> +               return -EINVAL;
> +       }
> +
>         /* if all controllers have the same amount attached, there's no holes */
>         if (uniform) {
>                 offset = rsvd_head;
> --
> 2.3.6
>
> _______________________________________________
> Nouveau mailing list
> Nouveau@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/nouveau
diff mbox

Patch

diff --git a/drm/nouveau/nvkm/subdev/fb/ramgf100.c b/drm/nouveau/nvkm/subdev/fb/ramgf100.c
index de9f395..9d4d196 100644
--- a/drm/nouveau/nvkm/subdev/fb/ramgf100.c
+++ b/drm/nouveau/nvkm/subdev/fb/ramgf100.c
@@ -545,6 +545,12 @@  gf100_ram_create_(struct nvkm_object *parent, struct nvkm_object *engine,
 		}
 	}
 
+	/* if over 1TB of VRAM is reported, something went very wrong, bail */
+	if (ram->size > (1ULL << 40)) {
+		nv_error(pfb, "invalid vram size: %llx\n", ram->size);
+		return -EINVAL;
+	}
+
 	/* if all controllers have the same amount attached, there's no holes */
 	if (uniform) {
 		offset = rsvd_head;