diff mbox series

scsi: ses: Fix out-of-bounds memory access in ses_enclosure_data_process()

Message ID 20190501180535.26718-1-longman@redhat.com (mailing list archive)
State Deferred
Headers show
Series scsi: ses: Fix out-of-bounds memory access in ses_enclosure_data_process() | expand

Commit Message

Waiman Long May 1, 2019, 6:05 p.m. UTC
KASAN has found a slab-out-of-bounds error in ses_enclosure_data_process().

[   27.298092] BUG: KASAN: slab-out-of-bounds in ses_enclosure_data_process+0x919/0xe80 [ses]
[   27.306407] Read of size 1 at addr ffff8807c99048b1 by task systemd-udevd/1563
[   27.315173] CPU: 18 PID: 1563 Comm: systemd-udevd Not tainted 4.18.0-80.23.el8.x86_64+debug #1
[   27.323835] Hardware name: HPE ProLiant XL450 Gen10/ProLiant XL450 Gen10, BIOS U40 10/02/2018
[   27.332410] Call Trace:
  :
[   27.348557]  kasan_report.cold.6+0x92/0x1a6
[   27.352771]  ses_enclosure_data_process+0x919/0xe80 [ses]
[   27.358211]  ? kfree+0xd6/0x2e0
[   27.361376]  ses_intf_add+0xa23/0xef1 [ses]
[   27.365590]  ? class_dev_iter_next+0x6c/0xc0
[   27.370034]  class_interface_register+0x298/0x400
[   27.374769]  ? tsc_cs_mark_unstable+0x60/0x60
[   27.379163]  ? class_dev_iter_exit+0x10/0x10
[   27.384857]  ? 0xffffffffc0838000
[   27.389580]  ses_init+0x12/0x1000 [ses]
[   27.394832]  do_one_initcall+0xe9/0x5fd
  :
[   27.562322] Allocated by task 1563:
[   27.562330]  kasan_kmalloc+0xbf/0xe0
[   27.569433]  __kmalloc+0x150/0x360
[   27.572858]  ses_intf_add+0x76a/0xef1 [ses]
[   27.577068]  class_interface_register+0x298/0x400
[   27.581803]  ses_init+0x12/0x1000 [ses]
[   27.587053]  do_one_initcall+0xe9/0x5fd
[   27.592309]  do_init_module+0x1f2/0x710
[   27.597564]  load_module+0x3e19/0x5910
[   27.601336]  __do_sys_init_module+0x1dd/0x260
[   27.606673]  do_syscall_64+0xa5/0x4a0
[   27.610359]  entry_SYSCALL_64_after_hwframe+0x6a/0xdf
  :
[   27.624932] The buggy address belongs to the object at ffff8807c9904400
                which belongs to the cache kmalloc-2048 of size 2048
[   27.637701] The buggy address is located 1201 bytes inside of
                2048-byte region [ffff8807c9904400, ffff8807c9904c00)
[   27.649683] The buggy address belongs to the page:
[   27.654503] page:ffffea001f264000 count:1 mapcount:0 mapping:ffff880107c15d80 index:0x0 compound_mapcount: 0
[   27.664393] flags: 0x17ffffc0008100(slab|head)
[   27.668865] raw: 0017ffffc0008100 dead000000000100 dead000000000200 ffff880107c15d80
[   27.676656] raw: 0000000000000000 00000000800f000f 00000001ffffffff 0000000000000000
[   27.684444] page dumped because: kasan: bad access detected

The out-of-bounds memory access happens to ses_dev->page10 which has a
size of 1200 bytes in this case. The invalid memory access happens in:

 600 if (addl_desc_ptr &&
 601     /* only find additional descriptions for specific devices */
 602     (type_ptr[0] == ENCLOSURE_COMPONENT_DEVICE ||
 603      type_ptr[0] == ENCLOSURE_COMPONENT_ARRAY_DEVICE ||
 604      type_ptr[0] == ENCLOSURE_COMPONENT_SAS_EXPANDER ||
 605      /* these elements are optional */
 606      type_ptr[0] == ENCLOSURE_COMPONENT_SCSI_TARGET_PORT ||
 607      type_ptr[0] == ENCLOSURE_COMPONENT_SCSI_INITIATOR_PORT ||
 608      type_ptr[0] == ENCLOSURE_COMPONENT_CONTROLLER_ELECTRONICS))
 609         addl_desc_ptr += addl_desc_ptr[1] + 2; <-- here

To fix the out-of-bounds memory access, code is now added to make sure
that addl_desc_ptr will never point to an address outside of its bounds.

With this patch, the KASAN warning is gone.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 drivers/scsi/ses.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Waiman Long May 20, 2019, 2:41 p.m. UTC | #1
On 5/1/19 2:05 PM, Waiman Long wrote:
> KASAN has found a slab-out-of-bounds error in ses_enclosure_data_process().
>
> [   27.298092] BUG: KASAN: slab-out-of-bounds in ses_enclosure_data_process+0x919/0xe80 [ses]
> [   27.306407] Read of size 1 at addr ffff8807c99048b1 by task systemd-udevd/1563
> [   27.315173] CPU: 18 PID: 1563 Comm: systemd-udevd Not tainted 4.18.0-80.23.el8.x86_64+debug #1
> [   27.323835] Hardware name: HPE ProLiant XL450 Gen10/ProLiant XL450 Gen10, BIOS U40 10/02/2018
> [   27.332410] Call Trace:
>   :
> [   27.348557]  kasan_report.cold.6+0x92/0x1a6
> [   27.352771]  ses_enclosure_data_process+0x919/0xe80 [ses]
> [   27.358211]  ? kfree+0xd6/0x2e0
> [   27.361376]  ses_intf_add+0xa23/0xef1 [ses]
> [   27.365590]  ? class_dev_iter_next+0x6c/0xc0
> [   27.370034]  class_interface_register+0x298/0x400
> [   27.374769]  ? tsc_cs_mark_unstable+0x60/0x60
> [   27.379163]  ? class_dev_iter_exit+0x10/0x10
> [   27.384857]  ? 0xffffffffc0838000
> [   27.389580]  ses_init+0x12/0x1000 [ses]
> [   27.394832]  do_one_initcall+0xe9/0x5fd
>   :
> [   27.562322] Allocated by task 1563:
> [   27.562330]  kasan_kmalloc+0xbf/0xe0
> [   27.569433]  __kmalloc+0x150/0x360
> [   27.572858]  ses_intf_add+0x76a/0xef1 [ses]
> [   27.577068]  class_interface_register+0x298/0x400
> [   27.581803]  ses_init+0x12/0x1000 [ses]
> [   27.587053]  do_one_initcall+0xe9/0x5fd
> [   27.592309]  do_init_module+0x1f2/0x710
> [   27.597564]  load_module+0x3e19/0x5910
> [   27.601336]  __do_sys_init_module+0x1dd/0x260
> [   27.606673]  do_syscall_64+0xa5/0x4a0
> [   27.610359]  entry_SYSCALL_64_after_hwframe+0x6a/0xdf
>   :
> [   27.624932] The buggy address belongs to the object at ffff8807c9904400
>                 which belongs to the cache kmalloc-2048 of size 2048
> [   27.637701] The buggy address is located 1201 bytes inside of
>                 2048-byte region [ffff8807c9904400, ffff8807c9904c00)
> [   27.649683] The buggy address belongs to the page:
> [   27.654503] page:ffffea001f264000 count:1 mapcount:0 mapping:ffff880107c15d80 index:0x0 compound_mapcount: 0
> [   27.664393] flags: 0x17ffffc0008100(slab|head)
> [   27.668865] raw: 0017ffffc0008100 dead000000000100 dead000000000200 ffff880107c15d80
> [   27.676656] raw: 0000000000000000 00000000800f000f 00000001ffffffff 0000000000000000
> [   27.684444] page dumped because: kasan: bad access detected
>
> The out-of-bounds memory access happens to ses_dev->page10 which has a
> size of 1200 bytes in this case. The invalid memory access happens in:
>
>  600 if (addl_desc_ptr &&
>  601     /* only find additional descriptions for specific devices */
>  602     (type_ptr[0] == ENCLOSURE_COMPONENT_DEVICE ||
>  603      type_ptr[0] == ENCLOSURE_COMPONENT_ARRAY_DEVICE ||
>  604      type_ptr[0] == ENCLOSURE_COMPONENT_SAS_EXPANDER ||
>  605      /* these elements are optional */
>  606      type_ptr[0] == ENCLOSURE_COMPONENT_SCSI_TARGET_PORT ||
>  607      type_ptr[0] == ENCLOSURE_COMPONENT_SCSI_INITIATOR_PORT ||
>  608      type_ptr[0] == ENCLOSURE_COMPONENT_CONTROLLER_ELECTRONICS))
>  609         addl_desc_ptr += addl_desc_ptr[1] + 2; <-- here
>
> To fix the out-of-bounds memory access, code is now added to make sure
> that addl_desc_ptr will never point to an address outside of its bounds.
>
> With this patch, the KASAN warning is gone.
>
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  drivers/scsi/ses.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c
> index 0fc39224ce1e..dbc9acc2df2f 100644
> --- a/drivers/scsi/ses.c
> +++ b/drivers/scsi/ses.c
> @@ -605,9 +605,14 @@ static void ses_enclosure_data_process(struct enclosure_device *edev,
>  			     /* these elements are optional */
>  			     type_ptr[0] == ENCLOSURE_COMPONENT_SCSI_TARGET_PORT ||
>  			     type_ptr[0] == ENCLOSURE_COMPONENT_SCSI_INITIATOR_PORT ||
> -			     type_ptr[0] == ENCLOSURE_COMPONENT_CONTROLLER_ELECTRONICS))
> +			     type_ptr[0] == ENCLOSURE_COMPONENT_CONTROLLER_ELECTRONICS)) {
>  				addl_desc_ptr += addl_desc_ptr[1] + 2;
>  
> +				/* Ensure no out-of-bounds memory access */
> +				if (addl_desc_ptr >= ses_dev->page10 +
> +						     ses_dev->page10_len)
> +					addl_desc_ptr = NULL;
> +			}
>  		}
>  	}
>  	kfree(buf);

Ping! Any comment on this patch.

Cheers,
Longman
James Bottomley May 20, 2019, 2:52 p.m. UTC | #2
On Mon, 2019-05-20 at 10:41 -0400, Waiman Long wrote:
[...]
> > --- a/drivers/scsi/ses.c
> > +++ b/drivers/scsi/ses.c
> > @@ -605,9 +605,14 @@ static void ses_enclosure_data_process(struct
> > enclosure_device *edev,
> >  			     /* these elements are optional */
> >  			     type_ptr[0] ==
> > ENCLOSURE_COMPONENT_SCSI_TARGET_PORT ||
> >  			     type_ptr[0] ==
> > ENCLOSURE_COMPONENT_SCSI_INITIATOR_PORT ||
> > -			     type_ptr[0] ==
> > ENCLOSURE_COMPONENT_CONTROLLER_ELECTRONICS))
> > +			     type_ptr[0] ==
> > ENCLOSURE_COMPONENT_CONTROLLER_ELECTRONICS)) {
> >  				addl_desc_ptr += addl_desc_ptr[1]
> > + 2;
> >  
> > +				/* Ensure no out-of-bounds memory
> > access */
> > +				if (addl_desc_ptr >= ses_dev-
> > >page10 +
> > +						     ses_dev-
> > >page10_len)
> > +					addl_desc_ptr = NULL;
> > +			}
> >  		}
> >  	}
> >  	kfree(buf);
> 
> Ping! Any comment on this patch.

The update looks fine to me:

Reviewed-by: James E.J. Bottomley <jejb@linux.ibm.com>

It might also be interesting to find out how the proliant is
structuring this descriptor array to precipitate the out of bounds: Is
it just an off by one or something more serious?

James
Waiman Long May 20, 2019, 3:24 p.m. UTC | #3
On 5/20/19 10:52 AM, James Bottomley wrote:
> On Mon, 2019-05-20 at 10:41 -0400, Waiman Long wrote:
> [...]
>>> --- a/drivers/scsi/ses.c
>>> +++ b/drivers/scsi/ses.c
>>> @@ -605,9 +605,14 @@ static void ses_enclosure_data_process(struct
>>> enclosure_device *edev,
>>>  			     /* these elements are optional */
>>>  			     type_ptr[0] ==
>>> ENCLOSURE_COMPONENT_SCSI_TARGET_PORT ||
>>>  			     type_ptr[0] ==
>>> ENCLOSURE_COMPONENT_SCSI_INITIATOR_PORT ||
>>> -			     type_ptr[0] ==
>>> ENCLOSURE_COMPONENT_CONTROLLER_ELECTRONICS))
>>> +			     type_ptr[0] ==
>>> ENCLOSURE_COMPONENT_CONTROLLER_ELECTRONICS)) {
>>>  				addl_desc_ptr += addl_desc_ptr[1]
>>> + 2;
>>>  
>>> +				/* Ensure no out-of-bounds memory
>>> access */
>>> +				if (addl_desc_ptr >= ses_dev-
>>>> page10 +
>>> +						     ses_dev-
>>>> page10_len)
>>> +					addl_desc_ptr = NULL;
>>> +			}
>>>  		}
>>>  	}
>>>  	kfree(buf);
>> Ping! Any comment on this patch.
> The update looks fine to me:
>
> Reviewed-by: James E.J. Bottomley <jejb@linux.ibm.com>
>
> It might also be interesting to find out how the proliant is
> structuring this descriptor array to precipitate the out of bounds: Is
> it just an off by one or something more serious?

I didn't look into the detail the enclosure message returned by the
hardware, but I believe it may have more description entries (page7)
than extended description entries (page10).

I can try to reserve the system and find out what exactly is wrong with
that system if you really want to find that out.

Cheers,
Longman
James Bottomley May 20, 2019, 3:46 p.m. UTC | #4
On Mon, 2019-05-20 at 11:24 -0400, Waiman Long wrote:
> On 5/20/19 10:52 AM, James Bottomley wrote:
> > On Mon, 2019-05-20 at 10:41 -0400, Waiman Long wrote:
> > [...]
> > > > --- a/drivers/scsi/ses.c
> > > > +++ b/drivers/scsi/ses.c
> > > > @@ -605,9 +605,14 @@ static void
> > > > ses_enclosure_data_process(struct
> > > > enclosure_device *edev,
> > > >  			     /* these elements are optional */
> > > >  			     type_ptr[0] ==
> > > > ENCLOSURE_COMPONENT_SCSI_TARGET_PORT ||
> > > >  			     type_ptr[0] ==
> > > > ENCLOSURE_COMPONENT_SCSI_INITIATOR_PORT ||
> > > > -			     type_ptr[0] ==
> > > > ENCLOSURE_COMPONENT_CONTROLLER_ELECTRONICS))
> > > > +			     type_ptr[0] ==
> > > > ENCLOSURE_COMPONENT_CONTROLLER_ELECTRONICS)) {
> > > >  				addl_desc_ptr +=
> > > > addl_desc_ptr[1]
> > > > + 2;
> > > >  
> > > > +				/* Ensure no out-of-bounds
> > > > memory
> > > > access */
> > > > +				if (addl_desc_ptr >= ses_dev-
> > > > > page10 +
> > > > 
> > > > +						     ses_dev-
> > > > > page10_len)
> > > > 
> > > > +					addl_desc_ptr = NULL;
> > > > +			}
> > > >  		}
> > > >  	}
> > > >  	kfree(buf);
> > > 
> > > Ping! Any comment on this patch.
> > 
> > The update looks fine to me:
> > 
> > Reviewed-by: James E.J. Bottomley <jejb@linux.ibm.com>
> > 
> > It might also be interesting to find out how the proliant is
> > structuring this descriptor array to precipitate the out of bounds:
> > Is it just an off by one or something more serious?
> 
> I didn't look into the detail the enclosure message returned by the
> hardware, but I believe it may have more description entries (page7)
> than extended description entries (page10).
> 
> I can try to reserve the system and find out what exactly is wrong
> with that system if you really want to find that out.

Please.  What I'm interested in is whether this is simply a bug in the
array firmware, in which case the fix is sufficient, or whether there's
some problem with the parser, like mismatched expectations over added
trailing nulls or something.

James
Waiman Long May 20, 2019, 3:56 p.m. UTC | #5
On 5/20/19 11:46 AM, James Bottomley wrote:
> On Mon, 2019-05-20 at 11:24 -0400, Waiman Long wrote:
>> On 5/20/19 10:52 AM, James Bottomley wrote:
>>> On Mon, 2019-05-20 at 10:41 -0400, Waiman Long wrote:
>>> [...]
>>>>> --- a/drivers/scsi/ses.c
>>>>> +++ b/drivers/scsi/ses.c
>>>>> @@ -605,9 +605,14 @@ static void
>>>>> ses_enclosure_data_process(struct
>>>>> enclosure_device *edev,
>>>>>  			     /* these elements are optional */
>>>>>  			     type_ptr[0] ==
>>>>> ENCLOSURE_COMPONENT_SCSI_TARGET_PORT ||
>>>>>  			     type_ptr[0] ==
>>>>> ENCLOSURE_COMPONENT_SCSI_INITIATOR_PORT ||
>>>>> -			     type_ptr[0] ==
>>>>> ENCLOSURE_COMPONENT_CONTROLLER_ELECTRONICS))
>>>>> +			     type_ptr[0] ==
>>>>> ENCLOSURE_COMPONENT_CONTROLLER_ELECTRONICS)) {
>>>>>  				addl_desc_ptr +=
>>>>> addl_desc_ptr[1]
>>>>> + 2;
>>>>>  
>>>>> +				/* Ensure no out-of-bounds
>>>>> memory
>>>>> access */
>>>>> +				if (addl_desc_ptr >= ses_dev-
>>>>>> page10 +
>>>>> +						     ses_dev-
>>>>>> page10_len)
>>>>> +					addl_desc_ptr = NULL;
>>>>> +			}
>>>>>  		}
>>>>>  	}
>>>>>  	kfree(buf);
>>>> Ping! Any comment on this patch.
>>> The update looks fine to me:
>>>
>>> Reviewed-by: James E.J. Bottomley <jejb@linux.ibm.com>
>>>
>>> It might also be interesting to find out how the proliant is
>>> structuring this descriptor array to precipitate the out of bounds:
>>> Is it just an off by one or something more serious?
>> I didn't look into the detail the enclosure message returned by the
>> hardware, but I believe it may have more description entries (page7)
>> than extended description entries (page10).
>>
>> I can try to reserve the system and find out what exactly is wrong
>> with that system if you really want to find that out.
> Please.  What I'm interested in is whether this is simply a bug in the
> array firmware, in which case the fix is sufficient, or whether there's
> some problem with the parser, like mismatched expectations over added
> trailing nulls or something.
>
> James
>
OK, will let you know once I get hold of the system.

Cheers,
Longman
Martin K. Petersen May 20, 2019, 4:05 p.m. UTC | #6
James,

> Please.  What I'm interested in is whether this is simply a bug in the
> array firmware, in which case the fix is sufficient, or whether
> there's some problem with the parser, like mismatched expectations
> over added trailing nulls or something.

Our support folks have been looking at this for a while. We have seen
problems with devices from several vendors. To the extent that I gave up
the idea of blacklisting all of them.

I am collecting "bad" SES pages from these devices. I have added support
for RECEIVE DIAGNOSTICS to scsi_debug and added a bunch of deliberately
broken SES pages so we could debug this.

It appears to be very common for devices to return inconsistent or
invalid data. So pretty much all of the ses.c parsing needs to have
sanity checking heuristics added to prevent KASAN hiccups.
Douglas Gilbert May 20, 2019, 9:53 p.m. UTC | #7
On 2019-05-20 12:05 p.m., Martin K. Petersen wrote:
> 
> James,
> 
>> Please.  What I'm interested in is whether this is simply a bug in the
>> array firmware, in which case the fix is sufficient, or whether
>> there's some problem with the parser, like mismatched expectations
>> over added trailing nulls or something.
> 
> Our support folks have been looking at this for a while. We have seen
> problems with devices from several vendors. To the extent that I gave up
> the idea of blacklisting all of them.
> 
> I am collecting "bad" SES pages from these devices. I have added support
> for RECEIVE DIAGNOSTICS to scsi_debug and added a bunch of deliberately
> broken SES pages so we could debug this

Patches ??

> It appears to be very common for devices to return inconsistent or
> invalid data. So pretty much all of the ses.c parsing needs to have
> sanity checking heuristics added to prevent KASAN hiccups.

And it is not just SES device implementations that were broken. The
relationship between Additional Element Status diagnostic page (dpage)
and the Enclosure Status dpage was under-specified in SES-2 and that
led to the EIIOE field being introduced during the SES-3 revisions.
And the meaning of EIIOE was tweaked several times *** before SES-3 was
standardized. Anyone interested in the adventures of EIIOE can see
the code of sg_ses.c in sg3_utils. The sg_ses utility is many times
more complex than anything else in the sg3_utils package.

And that complexity led me to suspect that the Linux SES driver was
broken. It should be 3 or 4 times larger than it is! It simply doesn't
do enough checking.

So yes Martin, you are on the right track.

Doug Gilbert


BTW the NVME Management Interface folks have decided to use SES-3 for
NVME enclosure management rather than invent their own can of worms :-)

*** For example EIIOE started life as a 1 bit field, but two cases
     wasn't enough, so it became a 2 bit field and now uses all
     four possibilities.
Martin K. Petersen May 21, 2019, 12:02 p.m. UTC | #8
Doug,

>> I am collecting "bad" SES pages from these devices. I have added
>> support for RECEIVE DIAGNOSTICS to scsi_debug and added a bunch of
>> deliberately broken SES pages so we could debug this
>
> Patches ??

I have included the plumbing below. However, I need to synthesize the
contents of the pages with problems. I can't share the ones I have
received from customers so I removed the arrays from the patch.
Waiman Long May 21, 2019, 5:23 p.m. UTC | #9
On 5/20/19 11:56 AM, Waiman Long wrote:
> On 5/20/19 11:46 AM, James Bottomley wrote:
>> On Mon, 2019-05-20 at 11:24 -0400, Waiman Long wrote:
>>> On 5/20/19 10:52 AM, James Bottomley wrote:
>>>> On Mon, 2019-05-20 at 10:41 -0400, Waiman Long wrote:
>>>> [...]
>>>>>> --- a/drivers/scsi/ses.c
>>>>>> +++ b/drivers/scsi/ses.c
>>>>>> @@ -605,9 +605,14 @@ static void
>>>>>> ses_enclosure_data_process(struct
>>>>>> enclosure_device *edev,
>>>>>>  			     /* these elements are optional */
>>>>>>  			     type_ptr[0] ==
>>>>>> ENCLOSURE_COMPONENT_SCSI_TARGET_PORT ||
>>>>>>  			     type_ptr[0] ==
>>>>>> ENCLOSURE_COMPONENT_SCSI_INITIATOR_PORT ||
>>>>>> -			     type_ptr[0] ==
>>>>>> ENCLOSURE_COMPONENT_CONTROLLER_ELECTRONICS))
>>>>>> +			     type_ptr[0] ==
>>>>>> ENCLOSURE_COMPONENT_CONTROLLER_ELECTRONICS)) {
>>>>>>  				addl_desc_ptr +=
>>>>>> addl_desc_ptr[1]
>>>>>> + 2;
>>>>>>  
>>>>>> +				/* Ensure no out-of-bounds
>>>>>> memory
>>>>>> access */
>>>>>> +				if (addl_desc_ptr >= ses_dev-
>>>>>>> page10 +
>>>>>> +						     ses_dev-
>>>>>>> page10_len)
>>>>>> +					addl_desc_ptr = NULL;
>>>>>> +			}
>>>>>>  		}
>>>>>>  	}
>>>>>>  	kfree(buf);
>>>>> Ping! Any comment on this patch.
>>>> The update looks fine to me:
>>>>
>>>> Reviewed-by: James E.J. Bottomley <jejb@linux.ibm.com>
>>>>
>>>> It might also be interesting to find out how the proliant is
>>>> structuring this descriptor array to precipitate the out of bounds:
>>>> Is it just an off by one or something more serious?
>>> I didn't look into the detail the enclosure message returned by the
>>> hardware, but I believe it may have more description entries (page7)
>>> than extended description entries (page10).
>>>
>>> I can try to reserve the system and find out what exactly is wrong
>>> with that system if you really want to find that out.
>> Please.  What I'm interested in is whether this is simply a bug in the
>> array firmware, in which case the fix is sufficient, or whether there's
>> some problem with the parser, like mismatched expectations over added
>> trailing nulls or something.
>>
>> James
>>
> OK, will let you know once I get hold of the system.

I now have access to the xl450 system that can reproduce the error.  12
enclosure messages are processed during bootup. Of the one that causes
the KASAN warning, the byte dump of page 1 and 10 are:

[   28.185749] ses page 10 (len = 1200):
[   28.185762] a-0000: 0a 00 04 ac 00 00 00 00 16 22 00 00 01 01 00 01
[   28.185768] a-0010: 00 00 00 01 50 01 43 80 33 d4 45 bd 50 01 43 80
[   28.185774] a-0020: 33 d4 45 80 00 00 00 00 00 00 00 00 16 22 00 01
[   28.185779] a-0030: 01 01 00 02 00 00 00 01 50 01 43 80 33 d4 45 bd
[   28.185785] a-0040: 50 01 43 80 33 d4 45 81 00 00 00 00 00 00 00 00
[   28.185790] a-0050: 96 22 00 02 01 01 00 03 00 00 00 00 00 00 00 00
[   28.185794] a-0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   28.185798] a-0070: 00 00 00 00 96 22 00 03 01 01 00 04 00 00 00 00
[   28.185801] a-0080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   28.185805] a-0090: 00 00 00 00 00 00 00 00 96 22 00 04 01 01 00 05
[   28.185808] a-00a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   28.185812] a-00b0: 00 00 00 00 00 00 00 00 00 00 00 00 96 22 00 05
[   28.185815] a-00c0: 01 01 00 06 00 00 00 00 00 00 00 00 00 00 00 00
[   28.185818] a-00d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   28.185822] a-00e0: 96 22 00 06 01 01 00 07 00 00 00 00 00 00 00 00
[   28.185825] a-00f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   28.185829] a-0100: 00 00 00 00 96 22 00 07 01 01 00 08 00 00 00 00
[   28.185832] a-0110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   28.185836] a-0120: 00 00 00 00 00 00 00 00 96 22 00 08 01 01 00 09
[   28.185839] a-0130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   28.185843] a-0140: 00 00 00 00 00 00 00 00 00 00 00 00 96 22 00 09
[   28.185846] a-0150: 01 01 00 0a 00 00 00 00 00 00 00 00 00 00 00 00
[   28.185850] a-0160: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   28.185853] a-0170: 96 22 00 0a 01 01 00 0b 00 00 00 00 00 00 00 00
[   28.185857] a-0180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   28.185860] a-0190: 00 00 00 00 96 22 00 0b 01 01 00 0c 00 00 00 00
[   28.185864] a-01a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   28.185867] a-01b0: 00 00 00 00 00 00 00 00 96 22 00 0c 01 01 00 0d
[   28.185871] a-01c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   28.185874] a-01d0: 00 00 00 00 00 00 00 00 00 00 00 00 96 22 00 0d
[   28.185877] a-01e0: 01 01 00 0e 00 00 00 00 00 00 00 00 00 00 00 00
[   28.185881] a-01f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   28.185884] a-0200: 96 22 00 0e 01 01 00 0f 00 00 00 00 00 00 00 00
[   28.185888] a-0210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   28.185891] a-0220: 00 00 00 00 96 22 00 0f 01 01 00 10 00 00 00 00
[   28.185895] a-0230: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   28.185898] a-0240: 00 00 00 00 00 00 00 00 96 22 00 10 01 01 00 11
[   28.185902] a-0250: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   28.185905] a-0260: 00 00 00 00 00 00 00 00 00 00 00 00 96 22 00 11
[   28.185909] a-0270: 01 01 00 12 00 00 00 00 00 00 00 00 00 00 00 00
[   28.185912] a-0280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   28.185916] a-0290: 96 22 00 12 01 01 00 13 00 00 00 00 00 00 00 00
[   28.185919] a-02a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   28.185923] a-02b0: 00 00 00 00 96 22 00 13 01 01 00 14 00 00 00 00
[   28.185926] a-02c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   28.185930] a-02d0: 00 00 00 00 00 00 00 00 96 22 00 14 01 01 00 15
[   28.185933] a-02e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   28.185937] a-02f0: 00 00 00 00 00 00 00 00 00 00 00 00 96 22 00 15
[   28.185940] a-0300: 01 01 00 16 00 00 00 00 00 00 00 00 00 00 00 00
[   28.185944] a-0310: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   28.185947] a-0320: 96 22 00 16 01 01 00 17 00 00 00 00 00 00 00 00
[   28.185951] a-0330: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   28.185954] a-0340: 00 00 00 00 96 22 00 17 01 01 00 18 00 00 00 00
[   28.185958] a-0350: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   28.185961] a-0360: 00 00 00 00 00 00 00 00 96 22 00 18 01 01 00 19
[   28.185965] a-0370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   28.185968] a-0380: 00 00 00 00 00 00 00 00 00 00 00 00 96 22 00 19
[   28.185972] a-0390: 01 01 00 1a 00 00 00 00 00 00 00 00 00 00 00 00
[   28.185975] a-03a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   28.185979] a-03b0: 96 22 00 1a 01 01 00 1b 00 00 00 00 00 00 00 00
[   28.185982] a-03c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   28.185986] a-03d0: 00 00 00 00 96 22 00 1b 01 01 00 1c 00 00 00 00
[   28.185989] a-03e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   28.185993] a-03f0: 00 00 00 00 00 00 00 00 96 22 00 1c 01 01 00 1d
[   28.185996] a-0400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   28.185999] a-0410: 00 00 00 00 00 00 00 00 00 00 00 00 96 22 00 1d
[   28.186023] a-0420: 01 01 00 1e 00 00 00 00 00 00 00 00 00 00 00 00
[   28.186026] a-0430: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   28.186030] a-0440: 16 6e 00 47 30 40 00 00 50 01 43 80 33 d4 45 bd
[   28.186034] a-0450: 49 01 4a 02 4b 03 4c 04 4d 05 4e 06 4f 07 50 08
[   28.186038] a-0460: 51 09 52 0a 53 0b 54 0c 55 0d 56 0e 57 0f 58 10
[   28.186041] a-0470: 59 11 5a 12 5b 13 5c 14 5d 15 5e 16 5f 17 60 18
[   28.186045] a-0480: 61 19 62 1a 63 1b 64 1c 65 1d 66 1e ff ff ff ff
[   28.186048] a-0490: 68 ff 69 ff 6a ff 6b ff 6c ff 6d ff 6e ff 6f ff
[   28.186052] a-04a0: 71 ff 72 ff 73 ff 74 ff 75 ff 76 ff 77 ff 78 ff
[   28.188282] ses page 1 (len = 36):
[   28.188290] 1-0000: 17 1e 00 0c 04 03 00 14 89 1e 00 18 07 01 00 0c
[   28.188298] 1-0010: 0e 01 00 10 18 01 00 0c 19 1e 00 14 19 08 00 20
[   28.188302] 1-0020: 19 08 00 20

Page 7 can't be retrieved for this message.

Please let me know what else do you want to have.

Cheers,
Longman
Waiman Long July 18, 2019, 6:18 p.m. UTC | #10
On 5/20/19 10:52 AM, James Bottomley wrote:
> On Mon, 2019-05-20 at 10:41 -0400, Waiman Long wrote:
> [...]
>>> --- a/drivers/scsi/ses.c
>>> +++ b/drivers/scsi/ses.c
>>> @@ -605,9 +605,14 @@ static void ses_enclosure_data_process(struct
>>> enclosure_device *edev,
>>>  			     /* these elements are optional */
>>>  			     type_ptr[0] ==
>>> ENCLOSURE_COMPONENT_SCSI_TARGET_PORT ||
>>>  			     type_ptr[0] ==
>>> ENCLOSURE_COMPONENT_SCSI_INITIATOR_PORT ||
>>> -			     type_ptr[0] ==
>>> ENCLOSURE_COMPONENT_CONTROLLER_ELECTRONICS))
>>> +			     type_ptr[0] ==
>>> ENCLOSURE_COMPONENT_CONTROLLER_ELECTRONICS)) {
>>>  				addl_desc_ptr += addl_desc_ptr[1]
>>> + 2;
>>>  
>>> +				/* Ensure no out-of-bounds memory
>>> access */
>>> +				if (addl_desc_ptr >= ses_dev-
>>>> page10 +
>>> +						     ses_dev-
>>>> page10_len)
>>> +					addl_desc_ptr = NULL;
>>> +			}
>>>  		}
>>>  	}
>>>  	kfree(buf);
>> Ping! Any comment on this patch.
> The update looks fine to me:
>
> Reviewed-by: James E.J. Bottomley <jejb@linux.ibm.com>
>
> It might also be interesting to find out how the proliant is
> structuring this descriptor array to precipitate the out of bounds: Is
> it just an off by one or something more serious?
>
> James
>
Is someone going to merge this patch in the current cycle?

Thanks,
Longman
Martin K. Petersen July 18, 2019, 6:26 p.m. UTC | #11
Waiman,

> Is someone going to merge this patch in the current cycle?

I was hoping somebody would step up and patch all the bad accesses and
not just page 10.
Waiman Long July 18, 2019, 6:29 p.m. UTC | #12
On 7/18/19 2:26 PM, Martin K. Petersen wrote:
> Waiman,
>
>> Is someone going to merge this patch in the current cycle?
> I was hoping somebody would step up and patch all the bad accesses and
> not just page 10.
>
I see.

Thanks,
Longman
diff mbox series

Patch

diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c
index 0fc39224ce1e..dbc9acc2df2f 100644
--- a/drivers/scsi/ses.c
+++ b/drivers/scsi/ses.c
@@ -605,9 +605,14 @@  static void ses_enclosure_data_process(struct enclosure_device *edev,
 			     /* these elements are optional */
 			     type_ptr[0] == ENCLOSURE_COMPONENT_SCSI_TARGET_PORT ||
 			     type_ptr[0] == ENCLOSURE_COMPONENT_SCSI_INITIATOR_PORT ||
-			     type_ptr[0] == ENCLOSURE_COMPONENT_CONTROLLER_ELECTRONICS))
+			     type_ptr[0] == ENCLOSURE_COMPONENT_CONTROLLER_ELECTRONICS)) {
 				addl_desc_ptr += addl_desc_ptr[1] + 2;
 
+				/* Ensure no out-of-bounds memory access */
+				if (addl_desc_ptr >= ses_dev->page10 +
+						     ses_dev->page10_len)
+					addl_desc_ptr = NULL;
+			}
 		}
 	}
 	kfree(buf);