diff mbox

dax: fix offset overflow in dax_io

Message ID 82c44164-4ed6-0f5b-872d-f60fb685ca22@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Eric Sandeen June 23, 2016, 9:54 p.m. UTC
This isn't functionally apparent for some reason, but
when we test io at extreme offsets at the end of the loff_t
rang, such as in fstests xfs/071, the calculation of
"max" in dax_io() can be wrong due to pos + size overflowing.

For example,

# xfs_io -c "pwrite 9223372036854771712 512" /mnt/test/file

enters dax_io with:

start 0x7ffffffffffff000
end   0x7ffffffffffff200

and the rounded up "size" variable is 0x1000.  This yields:

pos + size 0x8000000000000000 (overflows loff_t)
       end 0x7ffffffffffff200

Due to the overflow, the min() function picks the wrong
value for the "max" variable, and when we send (max - pos)
into i.e. copy_from_iter_pmem() it is also the wrong value.

This somehow(tm) gets magically absorbed without incident,
probably because iter->count is correct.  But it seems best
to fix it up properly by comparing the two values as
unsigned.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---


--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Dan Williams June 25, 2016, 5:39 p.m. UTC | #1
On Thu, Jun 23, 2016 at 2:54 PM, Eric Sandeen <sandeen@redhat.com> wrote:
> This isn't functionally apparent for some reason, but
> when we test io at extreme offsets at the end of the loff_t
> rang, such as in fstests xfs/071, the calculation of
> "max" in dax_io() can be wrong due to pos + size overflowing.
>
> For example,
>
> # xfs_io -c "pwrite 9223372036854771712 512" /mnt/test/file
>
> enters dax_io with:
>
> start 0x7ffffffffffff000
> end   0x7ffffffffffff200
>
> and the rounded up "size" variable is 0x1000.  This yields:
>
> pos + size 0x8000000000000000 (overflows loff_t)
>        end 0x7ffffffffffff200
>
> Due to the overflow, the min() function picks the wrong
> value for the "max" variable, and when we send (max - pos)
> into i.e. copy_from_iter_pmem() it is also the wrong value.
>
> This somehow(tm) gets magically absorbed without incident,
> probably because iter->count is correct.  But it seems best
> to fix it up properly by comparing the two values as
> unsigned.

So this is 4.8 material, or a user visible bug that we should take
into 4.7 and -stable?
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen June 25, 2016, 6:49 p.m. UTC | #2
On 6/25/16 12:39 PM, Dan Williams wrote:
> On Thu, Jun 23, 2016 at 2:54 PM, Eric Sandeen <sandeen@redhat.com> wrote:
>> This isn't functionally apparent for some reason, but
>> when we test io at extreme offsets at the end of the loff_t
>> rang, such as in fstests xfs/071, the calculation of
>> "max" in dax_io() can be wrong due to pos + size overflowing.
>>
>> For example,
>>
>> # xfs_io -c "pwrite 9223372036854771712 512" /mnt/test/file
>>
>> enters dax_io with:
>>
>> start 0x7ffffffffffff000
>> end   0x7ffffffffffff200
>>
>> and the rounded up "size" variable is 0x1000.  This yields:
>>
>> pos + size 0x8000000000000000 (overflows loff_t)
>>        end 0x7ffffffffffff200
>>
>> Due to the overflow, the min() function picks the wrong
>> value for the "max" variable, and when we send (max - pos)
>> into i.e. copy_from_iter_pmem() it is also the wrong value.
>>
>> This somehow(tm) gets magically absorbed without incident,
>> probably because iter->count is correct.  But it seems best
>> to fix it up properly by comparing the two values as
>> unsigned.
> 
> So this is 4.8 material, or a user visible bug that we should take
> into 4.7 and -stable?

I have not seen any user-visible problems upstream, but the same
behavior certainly exists in older upstream kernels.  To be honest
I haven't quite sorted out why sending the "wrong" length into
copy_from_iter_pmem() doesn't seem to matter.

It only exists at the 8EB boundary, so other than fstests specific
to that scenario, I don't think it's much of a real-world problem.

It's probably fine and safe for -stable, but it's in no way
critical or urgent IMHO.

Thanks,
-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dan Williams June 27, 2016, 6:43 p.m. UTC | #3
On Sat, Jun 25, 2016 at 11:49 AM, Eric Sandeen <sandeen@sandeen.net> wrote:
>
> On 6/25/16 12:39 PM, Dan Williams wrote:
>> On Thu, Jun 23, 2016 at 2:54 PM, Eric Sandeen <sandeen@redhat.com> wrote:
>>> This isn't functionally apparent for some reason, but
>>> when we test io at extreme offsets at the end of the loff_t
>>> rang, such as in fstests xfs/071, the calculation of
>>> "max" in dax_io() can be wrong due to pos + size overflowing.
>>>
>>> For example,
>>>
>>> # xfs_io -c "pwrite 9223372036854771712 512" /mnt/test/file
>>>
>>> enters dax_io with:
>>>
>>> start 0x7ffffffffffff000
>>> end   0x7ffffffffffff200
>>>
>>> and the rounded up "size" variable is 0x1000.  This yields:
>>>
>>> pos + size 0x8000000000000000 (overflows loff_t)
>>>        end 0x7ffffffffffff200
>>>
>>> Due to the overflow, the min() function picks the wrong
>>> value for the "max" variable, and when we send (max - pos)
>>> into i.e. copy_from_iter_pmem() it is also the wrong value.
>>>
>>> This somehow(tm) gets magically absorbed without incident,
>>> probably because iter->count is correct.  But it seems best
>>> to fix it up properly by comparing the two values as
>>> unsigned.
>>
>> So this is 4.8 material, or a user visible bug that we should take
>> into 4.7 and -stable?
>
> I have not seen any user-visible problems upstream, but the same
> behavior certainly exists in older upstream kernels.  To be honest
> I haven't quite sorted out why sending the "wrong" length into
> copy_from_iter_pmem() doesn't seem to matter.
>
> It only exists at the 8EB boundary, so other than fstests specific
> to that scenario, I don't think it's much of a real-world problem.
>
> It's probably fine and safe for -stable, but it's in no way
> critical or urgent IMHO.
>

Ok, I'll queue it with a few other libnvdimm fixes that are pending.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/dax.c b/fs/dax.c
index 761495b..e207f8f 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -208,7 +208,12 @@  static ssize_t dax_io(struct inode *inode, struct iov_iter *iter,
 				dax.addr += first;
 				size = map_len - first;
 			}
-			max = min(pos + size, end);
+			/*
+			 * pos + size is one past the last offset for IO,
+			 * so pos + size can overflow loff_t at extreme offsets.
+			 * Cast to u64 to catch this and get the true minimum.
+			 */
+			max = min_t(u64, pos + size, end);
 		}
 
 		if (iov_iter_rw(iter) == WRITE) {