diff mbox

4.15.14 crash with iscsi target and dvd

Message ID 20180409233436.GC6450@ming.t460p (mailing list archive)
State Not Applicable
Headers show

Commit Message

Ming Lei April 9, 2018, 11:34 p.m. UTC
On Mon, Apr 09, 2018 at 09:30:11PM +0000, Bart Van Assche wrote:
> On Sun, 2018-04-08 at 12:02 -0400, Wakko Warner wrote:
> > I finished with git bisect.  Here's the output:
> > 84c8590646d5b35804bac60eb58b145839b5893e is the first bad commit
> > commit 84c8590646d5b35804bac60eb58b145839b5893e
> > Author: Ming Lei <tom.leiming@gmail.com>
> > Date:   Fri Nov 11 20:05:32 2016 +0800
> > 
> >     target: avoid accessing .bi_vcnt directly
> >     
> >     When the bio is full, bio_add_pc_page() will return zero,
> >     so use this information tell when the bio is full.
> >     
> >     Also replace access to .bi_vcnt for pr_debug() with bio_segments().
> >     
> >     Reviewed-by: Christoph Hellwig <hch@lst.de>
> >     Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> >     Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> >     Signed-off-by: Jens Axboe <axboe@fb.com>
> > 
> > :040000 040000 a3ebbb71c52ee4eb8c3be4d033b81179211bf704 de39a328dbd1b18519946b3ad46d9302886e0dd0 M      drivers
> > 
> > I did a diff between HEAD^ and HEAD and manually patched the file from
> > 4.15.14.  It's not an exact revert.  I'm running it now and it's working.
> > I'll do a better test later on.  Here's the patch:
> > 
> > --- a/drivers/target/target_core_pscsi.c	2018-02-04 14:31:31.077316617 -0500
> > +++ b/drivers/target/target_core_pscsi.c	2018-04-08 11:43:49.588641374 -0400
> > @@ -915,7 +915,9 @@
> >  					bio, page, bytes, off);
> >  			pr_debug("PSCSI: bio->bi_vcnt: %d nr_vecs: %d\n",
> >  				bio_segments(bio), nr_vecs);
> > -			if (rc != bytes) {
> > +			if (rc != bytes)
> > +				goto fail;
> > +			if (bio->bi_vcnt > nr_vecs) {
> >  				pr_debug("PSCSI: Reached bio->bi_vcnt max:"
> >  					" %d i: %d bio: %p, allocating another"
> >  					" bio\n", bio->bi_vcnt, i, bio);
> 
> Hello Ming,
> 
> Can you have a look at this? The start of this e-mail thread is available at
> https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg72574.html.

Sure, thanks for your sharing.

Wakko, could you test the following patch and see if there is any
difference?

--

Comments

Wakko Warner April 9, 2018, 11:43 p.m. UTC | #1
Ming Lei wrote:
> On Mon, Apr 09, 2018 at 09:30:11PM +0000, Bart Van Assche wrote:
> > Hello Ming,
> > 
> > Can you have a look at this? The start of this e-mail thread is available at
> > https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg72574.html.
> 
> Sure, thanks for your sharing.
> 
> Wakko, could you test the following patch and see if there is any
> difference?

Sure, one question, is this against 4.15 or does it matter.  Last I looked,
4.16 hasn't changed from 4.15 for that file.

> diff --git a/drivers/target/target_core_pscsi.c b/drivers/target/target_core_pscsi.c
> index 0d99b242e82e..6147178f1f37 100644
> --- a/drivers/target/target_core_pscsi.c
> +++ b/drivers/target/target_core_pscsi.c
> @@ -888,7 +888,7 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
>  		if (len > 0 && data_len > 0) {
>  			bytes = min_t(unsigned int, len, PAGE_SIZE - off);
>  			bytes = min(bytes, data_len);
> -
> + new_bio:
>  			if (!bio) {
>  				nr_vecs = min_t(int, BIO_MAX_PAGES, nr_pages);
>  				nr_pages -= nr_vecs;
> @@ -931,6 +931,7 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
>  				 * be allocated with pscsi_get_bio() above.
>  				 */
>  				bio = NULL;
> +				goto new_bio;
>  			}
>  
>  			data_len -= bytes;
> 
> -- 
> Ming
Ming Lei April 9, 2018, 11:51 p.m. UTC | #2
On Mon, Apr 09, 2018 at 07:43:01PM -0400, Wakko Warner wrote:
> Ming Lei wrote:
> > On Mon, Apr 09, 2018 at 09:30:11PM +0000, Bart Van Assche wrote:
> > > Hello Ming,
> > > 
> > > Can you have a look at this? The start of this e-mail thread is available at
> > > https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg72574.html.
> > 
> > Sure, thanks for your sharing.
> > 
> > Wakko, could you test the following patch and see if there is any
> > difference?
> 
> Sure, one question, is this against 4.15 or does it matter.  Last I looked,
> 4.16 hasn't changed from 4.15 for that file.

It is two-line change, and I am sure it can be applied on 4.15 cleanly. 

Thanks,
Ming
Wakko Warner April 11, 2018, 12:45 a.m. UTC | #3
Ming Lei wrote:
> Sure, thanks for your sharing.
> 
> Wakko, could you test the following patch and see if there is any
> difference?
> 
> --
> diff --git a/drivers/target/target_core_pscsi.c b/drivers/target/target_core_pscsi.c
> index 0d99b242e82e..6147178f1f37 100644
> --- a/drivers/target/target_core_pscsi.c
> +++ b/drivers/target/target_core_pscsi.c
> @@ -888,7 +888,7 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
>  		if (len > 0 && data_len > 0) {
>  			bytes = min_t(unsigned int, len, PAGE_SIZE - off);
>  			bytes = min(bytes, data_len);
> -
> + new_bio:
>  			if (!bio) {
>  				nr_vecs = min_t(int, BIO_MAX_PAGES, nr_pages);
>  				nr_pages -= nr_vecs;
> @@ -931,6 +931,7 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
>  				 * be allocated with pscsi_get_bio() above.
>  				 */
>  				bio = NULL;
> +				goto new_bio;
>  			}
>  
>  			data_len -= bytes;

Sorry for the delay.  I reverted my change, added this one.  I didn't
reboot, I just unloaded and loaded this one.
Note: /dev/sr1 as seen from the initiator is /dev/sr0 (physical disc) on the
target.

Doesn't crash, however on the initiator I see this:
[9273849.707777] ISO 9660 Extensions: RRIP_1991A
[9273863.359718] scsi_io_completion: 13 callbacks suppressed
[9273863.359788] sr 26:0:0:0: [sr1] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[9273863.359909] sr 26:0:0:0: [sr1] tag#1 Sense Key : 0x2 [current] 
[9273863.359974] sr 26:0:0:0: [sr1] tag#1 ASC=0x8 ASCQ=0x0 
[9273863.360036] sr 26:0:0:0: [sr1] tag#1 CDB: opcode=0x28 28 00 00 22 f6 96 00 00 80 00
[9273863.360116] blk_update_request: 13 callbacks suppressed
[9273863.360177] blk_update_request: I/O error, dev sr1, sector 9165400
[9273875.864648] sr 26:0:0:0: [sr1] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[9273875.864738] sr 26:0:0:0: [sr1] tag#1 Sense Key : 0x2 [current] 
[9273875.864801] sr 26:0:0:0: [sr1] tag#1 ASC=0x8 ASCQ=0x0 
[9273875.864890] sr 26:0:0:0: [sr1] tag#1 CDB: opcode=0x28 28 00 00 22 f7 16 00 00 80 00
[9273875.864971] blk_update_request: I/O error, dev sr1, sector 9165912

To cause this, I mounted the dvd as seen in the first line and ran this
command: find /cdrom2 -type f | xargs -tn1 cat > /dev/null
I did some various tests.  Each test was done after umount and mount to
clear the cache.
cat <file> > /dev/null causes the message.
dd if=<file> of=/dev/null bs=2048 doesn't
	using bs=4096 doesn't
	using bs=64k doesn't
	using bs=128k does
cat uses a blocksize of 128k.

The following was done without being mounted.
ddrescue -f -f /dev/sr1 /dev/null 
	doesn't cause the message
dd if=/dev/sr1 of=/dev/null bs=128k
	doesn't cause the message
using bs=256k causes the message once:
[9275916.857409] sr 27:0:0:0: [sr1] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[9275916.857482] sr 27:0:0:0: [sr1] tag#0 Sense Key : 0x2 [current] 
[9275916.857520] sr 27:0:0:0: [sr1] tag#0 ASC=0x8 ASCQ=0x0 
[9275916.857556] sr 27:0:0:0: [sr1] tag#0 CDB: opcode=0x28 28 00 00 00 00 00 00 00 80 00
[9275916.857614] blk_update_request: I/O error, dev sr1, sector 0

If I access the disc from the target natively either by mounting and
accessing files or working with the device directly (ie dd) no errors are
logged on the target.
Wakko Warner April 12, 2018, 12:52 a.m. UTC | #4
Wakko Warner wrote:
> Ming Lei wrote:
> > Sure, thanks for your sharing.
> > 
> > Wakko, could you test the following patch and see if there is any
> > difference?
> > 
> > --
> > diff --git a/drivers/target/target_core_pscsi.c b/drivers/target/target_core_pscsi.c
> > index 0d99b242e82e..6147178f1f37 100644
> > --- a/drivers/target/target_core_pscsi.c
> > +++ b/drivers/target/target_core_pscsi.c
> > @@ -888,7 +888,7 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
> >  		if (len > 0 && data_len > 0) {
> >  			bytes = min_t(unsigned int, len, PAGE_SIZE - off);
> >  			bytes = min(bytes, data_len);
> > -
> > + new_bio:
> >  			if (!bio) {
> >  				nr_vecs = min_t(int, BIO_MAX_PAGES, nr_pages);
> >  				nr_pages -= nr_vecs;
> > @@ -931,6 +931,7 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
> >  				 * be allocated with pscsi_get_bio() above.
> >  				 */
> >  				bio = NULL;
> > +				goto new_bio;
> >  			}
> >  
> >  			data_len -= bytes;
> 
> Sorry for the delay.  I reverted my change, added this one.  I didn't
> reboot, I just unloaded and loaded this one.
> Note: /dev/sr1 as seen from the initiator is /dev/sr0 (physical disc) on the
> target.
> 
> Doesn't crash, however on the initiator I see this:
> [9273849.707777] ISO 9660 Extensions: RRIP_1991A
> [9273863.359718] scsi_io_completion: 13 callbacks suppressed
> [9273863.359788] sr 26:0:0:0: [sr1] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
> [9273863.359909] sr 26:0:0:0: [sr1] tag#1 Sense Key : 0x2 [current] 
> [9273863.359974] sr 26:0:0:0: [sr1] tag#1 ASC=0x8 ASCQ=0x0 
> [9273863.360036] sr 26:0:0:0: [sr1] tag#1 CDB: opcode=0x28 28 00 00 22 f6 96 00 00 80 00
> [9273863.360116] blk_update_request: 13 callbacks suppressed
> [9273863.360177] blk_update_request: I/O error, dev sr1, sector 9165400
> [9273875.864648] sr 26:0:0:0: [sr1] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
> [9273875.864738] sr 26:0:0:0: [sr1] tag#1 Sense Key : 0x2 [current] 
> [9273875.864801] sr 26:0:0:0: [sr1] tag#1 ASC=0x8 ASCQ=0x0 
> [9273875.864890] sr 26:0:0:0: [sr1] tag#1 CDB: opcode=0x28 28 00 00 22 f7 16 00 00 80 00
> [9273875.864971] blk_update_request: I/O error, dev sr1, sector 9165912

Just FYI: The jobs that I do that uses the disc over iscsi didn't cause any
kernel messages on either system (except for the informational when the disc
was mounted)

I have a dumb question though.  Could the label be placed just after the
'if' statement instead of before it?  bio is set to null and the 'if'
statement checks if it's null, which it always would be after the goto.
diff mbox

Patch

diff --git a/drivers/target/target_core_pscsi.c b/drivers/target/target_core_pscsi.c
index 0d99b242e82e..6147178f1f37 100644
--- a/drivers/target/target_core_pscsi.c
+++ b/drivers/target/target_core_pscsi.c
@@ -888,7 +888,7 @@  pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
 		if (len > 0 && data_len > 0) {
 			bytes = min_t(unsigned int, len, PAGE_SIZE - off);
 			bytes = min(bytes, data_len);
-
+ new_bio:
 			if (!bio) {
 				nr_vecs = min_t(int, BIO_MAX_PAGES, nr_pages);
 				nr_pages -= nr_vecs;
@@ -931,6 +931,7 @@  pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
 				 * be allocated with pscsi_get_bio() above.
 				 */
 				bio = NULL;
+				goto new_bio;
 			}
 
 			data_len -= bytes;