running out of tags in 9P (was Re: [git pull] vfs part 2)
diff mbox

Message ID 20150702164926.GN17109@ZenIV.linux.org.uk
State New
Headers show

Commit Message

Al Viro July 2, 2015, 4:49 p.m. UTC
On Thu, Jul 02, 2015 at 05:43:32PM +0100, Al Viro wrote:
>  	req->tc->tag = tag-1;
> +	if (req->status != REQ_STATUS_IDLE)
> +		pr_err("using tag %d with odd status (%d)", tag, req->status);

Should be tag - 1 here, actually.

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Andrey Ryabinin July 3, 2015, 8:19 a.m. UTC | #1
On 07/02/2015 07:49 PM, Al Viro wrote:
> On Thu, Jul 02, 2015 at 05:43:32PM +0100, Al Viro wrote:
>>  	req->tc->tag = tag-1;
>> +	if (req->status != REQ_STATUS_IDLE)
>> +		pr_err("using tag %d with odd status (%d)", tag, req->status);
> 
> Should be tag - 1 here, actually.

So, with this change:

[   81.654695] 9pnet: late delivery, tag 2 already freed
[   81.655186] 9pnet: mismatched reply [tag = 2]
[   85.014795] 9pnet: late delivery, tag 4 already freed
[   85.015753] 9pnet: late delivery, tag 4 already freed
[   85.016473] 9pnet: late delivery, tag 4 already freed
[   87.275842] 9pnet: mismatched reply [tag = 5]
[   88.943223] 9pnet: late delivery, tag 4 already freed
[   88.943790] 9pnet: late delivery, tag 4 already freed
[   88.944320] 9pnet: late delivery, tag 4 already freed
[   88.945050] 9pnet: mismatched reply [tag = 4]
[  105.551220] 9pnet: late delivery, tag 3 already freed
[  105.552065] 9pnet: late delivery, tag 3 already freed
[  114.528467] 9pnet: late delivery, tag 4 already freed
[  164.791295] 9pnet: late delivery, tag 2 already freed
[  164.792156] 9pnet: using tag 2 with odd status (4)
[  164.864398] 9pnet: mismatched reply [tag = 4]
[  166.419993] 9pnet: mismatched reply [tag = 3]
[  174.062625] 9pnet: late delivery, tag 3 already freed
[  174.063121] 9pnet: late delivery, tag 3 already freed
[  174.063637] 9pnet: late delivery, tag 3 already freed
[  174.064018] 9pnet: late delivery, tag 3 already freed
[  174.064547] 9pnet: mismatched reply [tag = 3]
[  175.418729] 9pnet: late delivery, tag 3 already freed
[  177.911727] 9pnet: mismatched reply [tag = 1]
[  235.748520] 9pnet: late delivery, tag 1 already freed
[  235.749595] 9pnet: using tag 1 with odd status (4)
[  243.496782] 9pnet: late delivery, tag 6 already freed
[  243.497697] 9pnet: late delivery, tag 6 already freed
[  243.499079] 9pnet: mismatched reply [tag = 6]
[  243.736388] 9pnet: late delivery, tag 1 already freed
[  243.740019] 9pnet: late delivery, tag 1 already freed
[  248.337712] 9pnet: late delivery, tag 2 already freed
[  249.872950] 9pnet: mismatched reply [tag = 9]
[  252.448858] 9pnet: mismatched reply [tag = 8]
[  252.668878] 9pnet: mismatched reply [tag = 11]
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Al Viro July 3, 2015, 9:42 a.m. UTC | #2
On Fri, Jul 03, 2015 at 11:19:31AM +0300, Andrey Ryabinin wrote:
> On 07/02/2015 07:49 PM, Al Viro wrote:
> > On Thu, Jul 02, 2015 at 05:43:32PM +0100, Al Viro wrote:
> >>  	req->tc->tag = tag-1;
> >> +	if (req->status != REQ_STATUS_IDLE)
> >> +		pr_err("using tag %d with odd status (%d)", tag, req->status);
> > 
> > Should be tag - 1 here, actually.
> 
> So, with this change:
> 
> [   81.654695] 9pnet: late delivery, tag 2 already freed
> [   81.655186] 9pnet: mismatched reply [tag = 2]
> [   85.014795] 9pnet: late delivery, tag 4 already freed
> [   85.015753] 9pnet: late delivery, tag 4 already freed
> [   85.016473] 9pnet: late delivery, tag 4 already freed
> [   87.275842] 9pnet: mismatched reply [tag = 5]
> [   88.943223] 9pnet: late delivery, tag 4 already freed
> [   88.943790] 9pnet: late delivery, tag 4 already freed
> [   88.944320] 9pnet: late delivery, tag 4 already freed
> [   88.945050] 9pnet: mismatched reply [tag = 4]
> [  105.551220] 9pnet: late delivery, tag 3 already freed
> [  105.552065] 9pnet: late delivery, tag 3 already freed
> [  114.528467] 9pnet: late delivery, tag 4 already freed
> [  164.791295] 9pnet: late delivery, tag 2 already freed
> [  164.792156] 9pnet: using tag 2 with odd status (4)
> [  164.864398] 9pnet: mismatched reply [tag = 4]
> [  166.419993] 9pnet: mismatched reply [tag = 3]
> [  174.062625] 9pnet: late delivery, tag 3 already freed
> [  174.063121] 9pnet: late delivery, tag 3 already freed
> [  174.063637] 9pnet: late delivery, tag 3 already freed
> [  174.064018] 9pnet: late delivery, tag 3 already freed
> [  174.064547] 9pnet: mismatched reply [tag = 3]
> [  175.418729] 9pnet: late delivery, tag 3 already freed
> [  177.911727] 9pnet: mismatched reply [tag = 1]
> [  235.748520] 9pnet: late delivery, tag 1 already freed
> [  235.749595] 9pnet: using tag 1 with odd status (4)
> [  243.496782] 9pnet: late delivery, tag 6 already freed
> [  243.497697] 9pnet: late delivery, tag 6 already freed
> [  243.499079] 9pnet: mismatched reply [tag = 6]
> [  243.736388] 9pnet: late delivery, tag 1 already freed
> [  243.740019] 9pnet: late delivery, tag 1 already freed
> [  248.337712] 9pnet: late delivery, tag 2 already freed
> [  249.872950] 9pnet: mismatched reply [tag = 9]
> [  252.448858] 9pnet: mismatched reply [tag = 8]
> [  252.668878] 9pnet: mismatched reply [tag = 11]

FWIW, I'd reproduced it here as well.  With the addition of checking in
p9_client_cb() whether the tag is already freed in IDR, reporting ->status
when it's not "sent" and reporting Tflush, I've got this:

[ 2754.713015] 9pnet: flushing 1
[ 2755.516123] 9pnet: flush 1 [3]done
[ 2758.443265] 9pnet: flushing 16
[ 2768.655768] 9pnet: flush 16 [8]done
[ 2783.939538] 9pnet: flushing 30
[ 2786.067856] 9pnet: flush 30 [3]done
[ 2809.784119] 9pnet: [ffff880113213000] delivery in 0, tag 1
[ 2809.948681] 9pnet: [ffff880113213000] using tag 1 with odd status (4)

Here we have delivery when that sucker is into p9_free_req(), but hasn't
removed from IDR yet.

[ 2809.948681] 9pnet: [ffff880113213000] delivery in 0, tag 1
[ 2816.767861] 9pnet: [ffff880113213000] using tag 1 with odd status (4)

Ditto.

[ 2816.767861] 9pnet: flushing 4
[ 2816.769484] 9pnet: flush 4 [5]done
[ 2846.327753] 9pnet: flushing 3
[ 2854.876131] 9pnet: flush 3 [4]done
[ 2856.492801] 9pnet: flushing 6
[ 2857.675642] 9pnet: flush 6 [1]done
[ 2860.051701] 9pnet: [ffff880113213000] delivery in 0, tag 1
[ 2860.052941] 9pnet: [ffff880113213000] late delivery, tag 1 already freed in IDR
[ 2860.680181] 9pnet: [ffff880113213000] using tag 1 with odd status (4)

This has hit between p9_free_req() and reallocating that request.

[ 2909.911815] 9pnet: [ffff880113213000] delivery in 4, tag 1
[ 2909.913145] 9pnet: [ffff880113213000] late delivery, tag 1 already freed in IDR

... while _this_ one has raced with p9_free_req() in a different way.

[ 2910.852202] 9pnet: flushing 3
[ 2917.985941] 9pnet: flush 3 [4]done
[ 2937.600652] 9pnet: flushing 2
[ 2939.775354] 9pnet: flush 2 [4]done
[ 2961.521600] 9pnet: [ffff880113213000] delivery in 0, tag 1
[ 2962.320383] 9pnet: [ffff880113213000] using tag 1 with odd status (4)

Same at the very first one.

[ 2962.320383] 9pnet: [ffff880113213000] delivery in 4, tag 8
[ 3001.578372] 9pnet: [ffff880113213000] delivery in 4, tag 6

Those had hit before p9_free_req().

AFAICS, we get occasional stray responses from somewhere.  And no, it doesn't
seem to be related to flushes or to dropping chan->lock in req_done() (this
run had been with chan->lock taken on the outside of the loop).

What I really don't understand is WTF is it playing with p9_tag_lookup() -
it's stashing req->tc via virtqueue_add_sgs() opaque data argument, fetches
it back in req_done(), then picks ->tag from it and uses p9_tag_lookup() to
find req.  Why not simply pass req instead?  I had been wrong about that
p9_tag_lookup() being able to return NULL, but why bother with it at all?
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch
diff mbox

diff --git a/net/9p/client.c b/net/9p/client.c
index 6f4c4c8..9719886 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -248,6 +248,9 @@  p9_tag_alloc(struct p9_client *c, u16 tag, unsigned int max_size)
 		/* check again since original check was outside of lock */
 		while (tag >= c->max_tag) {
 			row = (tag / P9_ROW_MAXTAG);
+
+			WARN_ON_ONCE(c->reqs[row]);	// are we about to leak?
+
 			c->reqs[row] = kcalloc(P9_ROW_MAXTAG,
 					sizeof(struct p9_req_t), GFP_ATOMIC);
 
@@ -286,6 +289,8 @@  p9_tag_alloc(struct p9_client *c, u16 tag, unsigned int max_size)
 	p9pdu_reset(req->rc);
 
 	req->tc->tag = tag-1;
+	if (req->status != REQ_STATUS_IDLE)
+		pr_err("using tag %d with odd status (%d)", tag - 1, req->status);
 	req->status = REQ_STATUS_ALLOC;
 
 	return req;
@@ -425,6 +430,8 @@  void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status)
 	 * the other thread wakes up will indeed be seen by the waiting side.
 	 */
 	smp_wmb();
+	if (req->status == REQ_STATUS_IDLE)
+		pr_err("late delivery, tag %d already freed", req->tc->tag);
 	req->status = status;
 
 	wake_up(req->wq);
@@ -693,6 +700,10 @@  static struct p9_req_t *p9_client_prepare_req(struct p9_client *c,
 		tag = p9_idpool_get(c->tagpool);
 		if (tag < 0)
 			return ERR_PTR(-ENOMEM);
+		if (WARN_ON_ONCE(tag != (u16)tag)) {	// wrapped around?
+			p9_idpool_put(tag, c->tagpool);
+			return ERR_PTR(-ENOMEM);
+		}
 	}
 
 	req = p9_tag_alloc(c, tag, req_size);
@@ -1647,7 +1658,10 @@  p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err)
 		if (*err) {
 			trace_9p_protocol_dump(clnt, req->rc);
 			p9_free_req(clnt, req);
+			break;
 		}
+		if (rsize < count)
+			pr_err("mismatched reply [tag = %d]\n", req->tc->tag);
 
 		p9_debug(P9_DEBUG_9P, "<<< RWRITE count %d\n", count);