Message ID | 20160216233609.GE17997@ZenIV.linux.org.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Feb 16, 2016 at 11:36:09PM +0000, Al Viro wrote:
> I strongly suspect that this is what's missing. Could you check if it helps?
BTW, I've pushed #orangefs-untested; the differences are
* several fixes folded back into relevant commits (no point creating
bisect hazards and making the history harder to read)
* this thing also folded (and in fact is what you get from diffing
that tree with your for-next)
* commit message for bufmap rewrite supplied.
Could you (assuming it works, etc.) switch your branch to that?
git checkout for-next, git fetch from vfs.git, git diff FETCH_HEAD to
verify that the delta is as it should be, then git reset --hard FETCH_HEAD
and git push --force <whatever remote you are using> for-next would do it...
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
It is still busted, I've been trying to find clues as to why... Maybe this is relevant: Alloced OP ffff880015698000 <- doomed op for orangefs_create MAILBOX2.CPT service_operation: orangefs_create op ffff880015698000 ffff880015698000 got past is_daemon_in_service ... lots of stuff ... w_f_m_d returned -11 for ffff880015698000 <- first op to get EAGAIN first client core is NOT in service second op to get EAGAIN ... last client core is NOT in service ... lots of stuff ... service_operation returns to orangef_create with handle 0 fsid 0 ret 0 for MAILBOX2.CPT I'm guessing you want me to wait to do the switching of my branch until we fix this (last?) thing, let me know... -Mike On Tue, Feb 16, 2016 at 6:54 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Tue, Feb 16, 2016 at 11:36:09PM +0000, Al Viro wrote: > >> I strongly suspect that this is what's missing. Could you check if it helps? > > BTW, I've pushed #orangefs-untested; the differences are > * several fixes folded back into relevant commits (no point creating > bisect hazards and making the history harder to read) > * this thing also folded (and in fact is what you get from diffing > that tree with your for-next) > * commit message for bufmap rewrite supplied. > > Could you (assuming it works, etc.) switch your branch to that? > git checkout for-next, git fetch from vfs.git, git diff FETCH_HEAD to > verify that the delta is as it should be, then git reset --hard FETCH_HEAD > and git push --force <whatever remote you are using> for-next would do it... -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Feb 17, 2016 at 02:24:34PM -0500, Mike Marshall wrote: > It is still busted, I've been trying to find clues as to why... With reinit_completion() added? > Maybe this is relevant: > > Alloced OP ffff880015698000 <- doomed op for orangefs_create MAILBOX2.CPT > service_operation: orangefs_create op ffff880015698000 > ffff880015698000 got past is_daemon_in_service > > ... lots of stuff ... > > w_f_m_d returned -11 for ffff880015698000 <- first op to get EAGAIN > > first client core is NOT in service > second op to get EAGAIN > ... > last client core is NOT in service > > ... lots of stuff ... > > service_operation returns to orangef_create with handle 0 fsid 0 ret 0 > for MAILBOX2.CPT > > I'm guessing you want me to wait to do the switching of my branch > until we fix this (last?) thing, let me know... What I'd like to check is the value of op->waitq.done at retry_servicing. If we get there with a non-zero value, we've a problem. BTW, do you hit any of gossip_err() in orangefs_clean_up_interrupted_operation()? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Feb 17, 2016 at 08:11:20PM +0000, Al Viro wrote: > With reinit_completion() added? > > > Maybe this is relevant: > > > > Alloced OP ffff880015698000 <- doomed op for orangefs_create MAILBOX2.CPT > > service_operation: orangefs_create op ffff880015698000 > > ffff880015698000 got past is_daemon_in_service > > > > ... lots of stuff ... > > > > w_f_m_d returned -11 for ffff880015698000 <- first op to get EAGAIN > > > > first client core is NOT in service > > second op to get EAGAIN > > ... > > last client core is NOT in service > > > > ... lots of stuff ... > > > > service_operation returns to orangef_create with handle 0 fsid 0 ret 0 > > for MAILBOX2.CPT > > > > I'm guessing you want me to wait to do the switching of my branch > > until we fix this (last?) thing, let me know... > > What I'd like to check is the value of op->waitq.done at retry_servicing. > If we get there with a non-zero value, we've a problem. BTW, do you > hit any of gossip_err() in orangefs_clean_up_interrupted_operation()? Am I right assuming that you are seeing zero from service_operation() for ORANGEFS_VFS_OP_CREATE with zeroed ->downcall.resp.create.refn? AFAICS, that can only happen with wait_for_matching_downcall() returning 0, which mean op_state_serviced(op). And prior to that we had set_op_state_waiting(op), which means set_op_state_serviced(op) done between those two... So unless you have something really nasty going on (buggered op refcounting, memory corruption, etc.), we had orangefs_devreq_write_iter() pick that op and copy the entire op->downcall from userland. With op->downcall.status not set to -EFAULT, or we would've... Oh, shit - it's fed through orangefs_normalize_to_errno(). OTOH, it would've hit gossip_err("orangefs: orangefs_normalize_to_errno: got error code which is not from ORANGEFS.\n") and presumably you would've noticed that. And it still would've returned that -EFAULT intact, actually. In any case, it's a bug - we need ret = -(ORANGEFS_ERROR_BIT|9); goto Broken; instead of those ret = -EFAULT; goto Broken; and ret = -(ORANGEFS_ERROR_BIT|8); goto Broken; instead of ret = -ENOMEM; goto Broken; in orangefs_devreq_write_iter(). Anyway, it looks like we didn't go through Broken: in there, so copy_from_user() must've been successful. Hmm... Could the daemon have really given that reply? Relevant test would be something like WARN_ON(op->upcall.type == ORANGEFS_VFS_OP_CREATE && !op->downcall.resp.create.refn.fsid); right after wakeup: /* * tell the vfs op waiting on a waitqueue * that this op is done */ spin_lock(&op->lock); if (unlikely(op_state_given_up(op))) { spin_unlock(&op->lock); goto out; } set_op_state_serviced(op); and plain WARN_ON(1) right after Broken: If that op goes through either of those set_op_state_serviced() with that zero refn.fsid, we'll see a WARN_ON triggered... -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> with reinit_completion() added? In the right place, even... > Am I right assuming that you are seeing zero from service_operation() > for ORANGEFS_VFS_OP_CREATE with zeroed ->downcall.resp.create.refn? > AFAICS, that can only happen with wait_for_matching_downcall() returning > 0, which mean op_state_serviced(op). Yes sir: Feb 17 12:25:15 be1 kernel: [ 857.901225] service_operation: wait_for_matching_downcall returned 0 for ffff880015698000 Feb 17 12:25:15 be1 kernel: [ 857.901228] orangefs: service_operation orangefs_create returning: 0 for ffff880015698000. Feb 17 12:25:15 be1 kernel: [ 857.901230] orangefs_create: MAILBOX2.CPT: handle:00000000-0000-0000-0000-000000000000: fsid:0: new_op:ffff880015698000: ret:0: I'll get those WARN_ONs in there and do some more tests. Martin thinks he might be on to a solution... he'll probably send a response soon... I can hear him typing at 500 characters a minute a few cubes away... -Mike On Wed, Feb 17, 2016 at 4:17 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Wed, Feb 17, 2016 at 08:11:20PM +0000, Al Viro wrote: >> With reinit_completion() added? >> >> > Maybe this is relevant: >> > >> > Alloced OP ffff880015698000 <- doomed op for orangefs_create MAILBOX2.CPT >> > service_operation: orangefs_create op ffff880015698000 >> > ffff880015698000 got past is_daemon_in_service >> > >> > ... lots of stuff ... >> > >> > w_f_m_d returned -11 for ffff880015698000 <- first op to get EAGAIN >> > >> > first client core is NOT in service >> > second op to get EAGAIN >> > ... >> > last client core is NOT in service >> > >> > ... lots of stuff ... >> > >> > service_operation returns to orangef_create with handle 0 fsid 0 ret 0 >> > for MAILBOX2.CPT >> > >> > I'm guessing you want me to wait to do the switching of my branch >> > until we fix this (last?) thing, let me know... >> >> What I'd like to check is the value of op->waitq.done at retry_servicing. >> If we get there with a non-zero value, we've a problem. BTW, do you >> hit any of gossip_err() in orangefs_clean_up_interrupted_operation()? > > Am I right assuming that you are seeing zero from service_operation() > for ORANGEFS_VFS_OP_CREATE with zeroed ->downcall.resp.create.refn? > AFAICS, that can only happen with wait_for_matching_downcall() returning > 0, which mean op_state_serviced(op). And prior to that we had > set_op_state_waiting(op), which means set_op_state_serviced(op) done between > those two... > > So unless you have something really nasty going on (buggered op refcounting, > memory corruption, etc.), we had orangefs_devreq_write_iter() pick that > op and copy the entire op->downcall from userland. With op->downcall.status > not set to -EFAULT, or we would've... Oh, shit - it's fed through > orangefs_normalize_to_errno(). OTOH, it would've hit > gossip_err("orangefs: orangefs_normalize_to_errno: got error code which is not > from ORANGEFS.\n") and presumably you would've noticed that. And it still > would've returned that -EFAULT intact, actually. In any case, it's a bug - > we need > ret = -(ORANGEFS_ERROR_BIT|9); > goto Broken; > instead of those > ret = -EFAULT; > goto Broken; > and > ret = -(ORANGEFS_ERROR_BIT|8); > goto Broken; > instead of > ret = -ENOMEM; > goto Broken; > in orangefs_devreq_write_iter(). > > Anyway, it looks like we didn't go through Broken: in there, so copy_from_user() > must've been successful. > > Hmm... Could the daemon have really given that reply? Relevant test would > be something like > WARN_ON(op->upcall.type == ORANGEFS_VFS_OP_CREATE && > !op->downcall.resp.create.refn.fsid); > right after > wakeup: > /* > * tell the vfs op waiting on a waitqueue > * that this op is done > */ > spin_lock(&op->lock); > if (unlikely(op_state_given_up(op))) { > spin_unlock(&op->lock); > goto out; > } > set_op_state_serviced(op); > > and plain WARN_ON(1) right after Broken: > > If that op goes through either of those set_op_state_serviced() with that > zero refn.fsid, we'll see a WARN_ON triggered... -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 17 Feb 2016, Mike Marshall wrote: > It is still busted, I've been trying to find clues as to why... > > Maybe this is relevant: > > Alloced OP ffff880015698000 <- doomed op for orangefs_create MAILBOX2.CPT > service_operation: orangefs_create op ffff880015698000 > ffff880015698000 got past is_daemon_in_service > > ... lots of stuff ... > > w_f_m_d returned -11 for ffff880015698000 <- first op to get EAGAIN > > first client core is NOT in service > second op to get EAGAIN > ... > last client core is NOT in service > > ... lots of stuff ... > > service_operation returns to orangef_create with handle 0 fsid 0 ret 0 > for MAILBOX2.CPT > > I'm guessing you want me to wait to do the switching of my branch > until we fix this (last?) thing, let me know... > > -Mike I think I've identified something screwy. Some process creates a file. Eventually we get into wait_for_matching_downcall with no client-core. W_f_m_d returns EAGAIN and op->lock is held. The op is still waiting and in orangefs_request_list. Service_operation calls orangefs_clean_up_interrupted_operation, which attempts to remove the op from orangefs_request_list. Meanwhile the client-core comes back and does a read. W_f_m_d has returned EAGAIN, but the op is still in orangefs_request_list, so it gets passed to the client-core. Now the op is in service and in htable_ops_in_progress. But service_operation is about to retry it under the impression that it was purged. So it puts the op back in orangefs_request_list. Then the client-core returns the op, so it is marked serviced and returned to orangefs_inode_create. Meanwhile something or other (great theory right? now I'm less sure) happens with the second request (they have the same tag) causing it to become corrupted. I admit it starts to fall apart at the end, and I don't have a clear theory on how this produces what we see. In orangefs_clean_up_interrupted_operation if (op_state_waiting(op)) { /* * upcall hasn't been read; remove op from upcall request * list. */ spin_unlock(&op->lock); /* HERE */ spin_lock(&orangefs_request_list_lock); list_del(&op->list); spin_unlock(&orangefs_request_list_lock); gossip_debug(GOSSIP_WAIT_DEBUG, "Interrupted: Removed op %p from request_list\n", op); } else if (op_state_in_progress(op)) { and orangefs_devreq_read restart: /* Get next op (if any) from top of list. */ spin_lock(&orangefs_request_list_lock); list_for_each_entry_safe(op, temp, &orangefs_request_list, list) { __s32 fsid; /* This lock is held past the end of the loop when we break. */ /* HERE */ spin_lock(&op->lock); if (unlikely(op_state_purged(op))) { spin_unlock(&op->lock); continue; } I think both processes can end up working on the same op. -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/orangefs/waitqueue.c b/fs/orangefs/waitqueue.c index 2539813..36eedd6 100644 --- a/fs/orangefs/waitqueue.c +++ b/fs/orangefs/waitqueue.c @@ -244,6 +244,7 @@ static void orangefs_clean_up_interrupted_operation(struct orangefs_kernel_op_s gossip_err("%s: can't get here.\n", __func__); spin_unlock(&op->lock); } + reinit_completion(&op->waitq); } /*