diff mbox

bio: return EINTR if copying to user space got interrupted

Message ID 1455266355-44676-1-git-send-email-hare@suse.de (mailing list archive)
State New, archived
Headers show

Commit Message

Hannes Reinecke Feb. 12, 2016, 8:39 a.m. UTC
Commit 35dc248383bbab0a7203fca4d722875bc81ef091 introduced a check for
current->mm to see if we have a user space context and only copies data
if we do. Now if an IO gets interrupted by a signal data isn't copied
into user space any more (as we don't have a user space context) but
user space isn't notified about it.

This patch modifies the behaviour to return -EINTR from bio_uncopy_user()
to notify userland that a signal has interrupted the syscall, otherwise
it could lead to a situation where the caller may get a buffer with
no data returned.

This can be reproduced by issuing SG_IO ioctl()s in one thread while
constantly sending signals to it.

Fixes: 35dc248 [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Cc: stable@vger.kernel.org # v.3.11+
---
 block/bio.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Jens Axboe Feb. 12, 2016, 3:17 p.m. UTC | #1
On 02/12/2016 01:39 AM, Hannes Reinecke wrote:
> Commit 35dc248383bbab0a7203fca4d722875bc81ef091 introduced a check for
> current->mm to see if we have a user space context and only copies data
> if we do. Now if an IO gets interrupted by a signal data isn't copied
> into user space any more (as we don't have a user space context) but
> user space isn't notified about it.
>
> This patch modifies the behaviour to return -EINTR from bio_uncopy_user()
> to notify userland that a signal has interrupted the syscall, otherwise
> it could lead to a situation where the caller may get a buffer with
> no data returned.
>
> This can be reproduced by issuing SG_IO ioctl()s in one thread while
> constantly sending signals to it.

Good catch, fix looks good to me. Applied for 4.5.
Douglas Gilbert Feb. 12, 2016, 4:05 p.m. UTC | #2
On 16-02-12 03:39 AM, Hannes Reinecke wrote:
> Commit 35dc248383bbab0a7203fca4d722875bc81ef091 introduced a check for
> current->mm to see if we have a user space context and only copies data
> if we do. Now if an IO gets interrupted by a signal data isn't copied
> into user space any more (as we don't have a user space context) but
> user space isn't notified about it.
>
> This patch modifies the behaviour to return -EINTR from bio_uncopy_user()
> to notify userland that a signal has interrupted the syscall, otherwise
> it could lead to a situation where the caller may get a buffer with
> no data returned.

Interesting, the "f091" commit has been in the kernel since 2013
hence your reference to v.3.11 . I always had the feeling that
handling signals that interrupted SG_IO calls was skating on thin
ice. Hence in ddpt (but not sg_dd nor sgp_dd) the code masks out
all signals (that it can) during the SG_IO calls then opens a
signal window briefly after a SG_IO ioctl has finished and before
the next one starts. This approach used by ddpt is borrowed from
dd (in coreutils) which masks signals during its read() and
write() calls.

Any idea how accurate resid is in this scenario?

Doug Gilbert

> This can be reproduced by issuing SG_IO ioctl()s in one thread while
> constantly sending signals to it.
>
> Fixes: 35dc248 [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal
> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> Cc: stable@vger.kernel.org # v.3.11+
> ---
>   block/bio.c | 7 +++++--
>   1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/block/bio.c b/block/bio.c
> index dbabd48..24e5b69 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -1090,9 +1090,12 @@ int bio_uncopy_user(struct bio *bio)
>   	if (!bio_flagged(bio, BIO_NULL_MAPPED)) {
>   		/*
>   		 * if we're in a workqueue, the request is orphaned, so
> -		 * don't copy into a random user address space, just free.
> +		 * don't copy into a random user address space, just free
> +		 * and return -EINTR so user space doesn't expect any data.
>   		 */
> -		if (current->mm && bio_data_dir(bio) == READ)
> +		if (!current->mm)
> +			ret = -EINTR;
> +		else if (bio_data_dir(bio) == READ)
>   			ret = bio_copy_to_iter(bio, bmd->iter);
>   		if (bmd->is_our_pages)
>   			bio_free_pages(bio);
>

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Reinecke Feb. 12, 2016, 4:14 p.m. UTC | #3
On 02/12/2016 05:05 PM, Douglas Gilbert wrote:
> On 16-02-12 03:39 AM, Hannes Reinecke wrote:
>> Commit 35dc248383bbab0a7203fca4d722875bc81ef091 introduced a check for
>> current->mm to see if we have a user space context and only copies data
>> if we do. Now if an IO gets interrupted by a signal data isn't copied
>> into user space any more (as we don't have a user space context) but
>> user space isn't notified about it.
>>
>> This patch modifies the behaviour to return -EINTR from bio_uncopy_user()
>> to notify userland that a signal has interrupted the syscall, otherwise
>> it could lead to a situation where the caller may get a buffer with
>> no data returned.
>
> Interesting, the "f091" commit has been in the kernel since 2013
> hence your reference to v.3.11 . I always had the feeling that
> handling signals that interrupted SG_IO calls was skating on thin
> ice.

Yeah; that bug was really annoying, as occasionally receiving no data 
whatsoever without any indication makes it really, really hard to debug.
Kudos go to Johannes and Ewan for pointing to the offending function.

> Hence in ddpt (but not sg_dd nor sgp_dd) the code masks out
> all signals (that it can) during the SG_IO calls then opens a
> signal window briefly after a SG_IO ioctl has finished and before
> the next one starts. This approach used by ddpt is borrowed from
> dd (in coreutils) which masks signals during its read() and
> write() calls.
>
> Any idea how accurate resid is in this scenario?
>
Bah. F*sk knows.
That takes far more POSIX knowledge than I have.
I would suspect that by masking out signals they'll be marked as pending 
for the process, and will be delivered once you unmask them.
Or dropped, depending.
In either case, once they are blocked out the kernel part shouldn't 
receive a signal, and hence we should be able to receive the data properly.
But as noted I'm not a POSIX expert.

Cheers,

Hannes
Ewan Milne Feb. 12, 2016, 4:15 p.m. UTC | #4
On Fri, 2016-02-12 at 09:39 +0100, Hannes Reinecke wrote:
> Commit 35dc248383bbab0a7203fca4d722875bc81ef091 introduced a check for
> current->mm to see if we have a user space context and only copies data
> if we do. Now if an IO gets interrupted by a signal data isn't copied
> into user space any more (as we don't have a user space context) but
> user space isn't notified about it.
> 
> This patch modifies the behaviour to return -EINTR from bio_uncopy_user()
> to notify userland that a signal has interrupted the syscall, otherwise
> it could lead to a situation where the caller may get a buffer with
> no data returned.
> 
> This can be reproduced by issuing SG_IO ioctl()s in one thread while
> constantly sending signals to it.

Well, this is definitely an improvement, since it now indicates to
the user that there was a problem instead of silently returning with
no data and no error, but it doesn't completely fix the issue.

For one thing, if the SG_IO is performed to a sequential media device,
the I/O is actually performed, and the position changes, it is just the
data that is not copied back.  So e.g. a user program making calls to
read from a tape device that gets signals and retrying on -EINTR will
skip blocks.  There is a similar problem already with SG_IO and
-ERESTARTSYS.  I believe that that only way around this is to mask the
signals.  (This is also problem with non-idempotent commands, such as
COMPARE AND WRITE.)

I should mention that the patch "sg: Fix user memory corruption when
SG_IO is interrupted by a signal" broke a 3rd-party kernel module that
happened to use SG_IO to pass *kernel* addresses for I/O.  Which worked
for them until the current->mm test was added.  (I don't know why they
didn't just put I/O on the request queue like everyone else, though.)

This patch should go in, though.

Reviewed-by: Ewan D. Milne <emilne@redhat.com>

> 
> Fixes: 35dc248 [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal
> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> Cc: stable@vger.kernel.org # v.3.11+
> ---
>  block/bio.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index dbabd48..24e5b69 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -1090,9 +1090,12 @@ int bio_uncopy_user(struct bio *bio)
>  	if (!bio_flagged(bio, BIO_NULL_MAPPED)) {
>  		/*
>  		 * if we're in a workqueue, the request is orphaned, so
> -		 * don't copy into a random user address space, just free.
> +		 * don't copy into a random user address space, just free
> +		 * and return -EINTR so user space doesn't expect any data.
>  		 */
> -		if (current->mm && bio_data_dir(bio) == READ)
> +		if (!current->mm)
> +			ret = -EINTR;
> +		else if (bio_data_dir(bio) == READ)
>  			ret = bio_copy_to_iter(bio, bmd->iter);
>  		if (bmd->is_our_pages)
>  			bio_free_pages(bio);


--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/block/bio.c b/block/bio.c
index dbabd48..24e5b69 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1090,9 +1090,12 @@  int bio_uncopy_user(struct bio *bio)
 	if (!bio_flagged(bio, BIO_NULL_MAPPED)) {
 		/*
 		 * if we're in a workqueue, the request is orphaned, so
-		 * don't copy into a random user address space, just free.
+		 * don't copy into a random user address space, just free
+		 * and return -EINTR so user space doesn't expect any data.
 		 */
-		if (current->mm && bio_data_dir(bio) == READ)
+		if (!current->mm)
+			ret = -EINTR;
+		else if (bio_data_dir(bio) == READ)
 			ret = bio_copy_to_iter(bio, bmd->iter);
 		if (bmd->is_our_pages)
 			bio_free_pages(bio);