Message ID | 1533157038-3924-18-git-send-email-bmarzins@redhat.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Delegated to: | christophe varoqui |
Headers | show |
Series | Misc Multipath patches | expand |
On Wed, 2018-08-01 at 15:57 -0500, Benjamin Marzinski wrote: > I realize that this locking was added to make an analyzer happy, > presumably because sometimes ct->devt was being accessed while ct->devt > was held, and sometimes not. The tur checker locking will be cleaned > up in a later patch to deal with this. Are you perhaps referring to commit 873be9fef222 ("libmultipath/checkers/tur: Serialize tur_checker_context.devt accesses")? Your assumption about how DRD works is wrong. DRD doesn't care about which mutex or other synchronization primitives are used to serialize accesses to data that is shared between threads. All it cares about is that concurrent accesses to shared data are serialized. I'm afraid that your patch reintroduces the race conditions that were fixed by commit 873be9fef222. Have you considered to fix this without removing the locking? Bart. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Sun, Aug 05, 2018 at 03:40:30PM +0000, Bart Van Assche wrote: > On Wed, 2018-08-01 at 15:57 -0500, Benjamin Marzinski wrote: > > I realize that this locking was added to make an analyzer happy, > > presumably because sometimes ct->devt was being accessed while ct->devt > > was held, and sometimes not. The tur checker locking will be cleaned > > up in a later patch to deal with this. > > Are you perhaps referring to commit 873be9fef222 ("libmultipath/checkers/tur: > Serialize tur_checker_context.devt accesses")? Your assumption about how DRD > works is wrong. DRD doesn't care about which mutex or other synchronization > primitives are used to serialize accesses to data that is shared between > threads. All it cares about is that concurrent accesses to shared data are > serialized. I'm afraid that your patch reintroduces the race conditions that > were fixed by commit 873be9fef222. Have you considered to fix this without > removing the locking? I'm confused here. If the data is only read by multiple threads, I don't see why we need to synchronize it. Unless I'm missing something, my patch does make sure that the only time ct->devt is written is when there is only one thread running that has access to ct. - if (fstat(c->fd, &sb) == 0) { ... + if (uatomic_read(&ct->holders) == 1 && ct->devt[0] == 0 && + fstat(c->fd, &sb) == 0) { holders will always be 2 when a tur checker thread is running (well, the thread will decrement it when it finishes, but it will not touch ct after that action). Since only the thread calling libcheck_init can start the tur checker thread using this context, if there is no thread running at the start of libcheck_check(), then no thread can be started until libcheck_check() starts it. Ideally, ct->devt would be set in libcheck_init(), but we don't have the information there, so setting it once, on the first time we call libcheck_check() seems like a reasonable, and safe, answer. I certainly prefer it to adding recusive locking where it isn't needed. -Ben > > Bart. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Mon, 2018-08-06 at 17:55 -0500, Benjamin Marzinski wrote: > I'm confused here. If the data is only read by multiple threads, I don't > see why we need to synchronize it. Unless I'm missing something, my > patch does make sure that the only time ct->devt is written is when > there is only one thread running that has access to ct. > > - if (fstat(c->fd, &sb) == 0) { > ... > + if (uatomic_read(&ct->holders) == 1 && ct->devt[0] == 0 && > + fstat(c->fd, &sb) == 0) { > > holders will always be 2 when a tur checker thread is running (well, the > thread will decrement it when it finishes, but it will not touch ct > after that action). Since only the thread calling libcheck_init can > start the tur checker thread using this context, if there is no thread > running at the start of libcheck_check(), then no thread can be started > until libcheck_check() starts it. > > Ideally, ct->devt would be set in libcheck_init(), but we don't have the > information there, so setting it once, on the first time we call > libcheck_check() seems like a reasonable, and safe, answer. I certainly > prefer it to adding recusive locking where it isn't needed. Replacing a locking scheme into an approach based on atomic variables is something I don't like because it makes the code harder to analyze for correctness. Have you considered to keep the locking and fix the bug that you described in the commit message? Thanks, Bart. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c index 275541f..bfb0907 100644 --- a/libmultipath/checkers/tur.c +++ b/libmultipath/checkers/tur.c @@ -37,32 +37,19 @@ #define MSG_TUR_FAILED "tur checker failed to initialize" struct tur_checker_context { - dev_t devt; + char devt[32]; int state; - int running; + int running; /* uatomic access only */ int fd; unsigned int timeout; time_t time; pthread_t thread; pthread_mutex_t lock; pthread_cond_t active; - int holders; + int holders; /* uatomic access only */ char message[CHECKER_MSG_LEN]; }; -static const char *tur_devt(char *devt_buf, int size, - struct tur_checker_context *ct) -{ - dev_t devt; - - pthread_mutex_lock(&ct->lock); - devt = ct->devt; - pthread_mutex_unlock(&ct->lock); - - snprintf(devt_buf, size, "%d:%d", major(devt), minor(devt)); - return devt_buf; -} - int libcheck_init (struct checker * c) { struct tur_checker_context *ct; @@ -232,14 +219,12 @@ static void *tur_thread(void *ctx) { struct tur_checker_context *ct = ctx; int state, running; - char devt[32]; /* This thread can be canceled, so setup clean up */ tur_thread_cleanup_push(ct); rcu_register_thread(); - condlog(3, "%s: tur checker starting up", - tur_devt(devt, sizeof(devt), ct)); + condlog(3, "%s: tur checker starting up", ct->devt); /* TUR checker start up */ pthread_mutex_lock(&ct->lock); @@ -256,8 +241,8 @@ static void *tur_thread(void *ctx) pthread_cond_signal(&ct->active); pthread_mutex_unlock(&ct->lock); - condlog(3, "%s: tur checker finished, state %s", - tur_devt(devt, sizeof(devt), ct), checker_state_name(state)); + condlog(3, "%s: tur checker finished, state %s", ct->devt, + checker_state_name(state)); running = uatomic_xchg(&ct->running, 0); if (!running) @@ -308,15 +293,14 @@ int libcheck_check(struct checker * c) struct stat sb; pthread_attr_t attr; int tur_status, r; - char devt[32]; if (!ct) return PATH_UNCHECKED; - if (fstat(c->fd, &sb) == 0) { - pthread_mutex_lock(&ct->lock); - ct->devt = sb.st_rdev; - pthread_mutex_unlock(&ct->lock); + if (uatomic_read(&ct->holders) == 1 && ct->devt[0] == 0 && + fstat(c->fd, &sb) == 0) { + snprintf(ct->devt, sizeof(ct->devt), "%d:%d", major(sb.st_rdev), + minor(sb.st_rdev)); } if (c->sync) @@ -327,8 +311,7 @@ int libcheck_check(struct checker * c) */ r = pthread_mutex_lock(&ct->lock); if (r != 0) { - condlog(2, "%s: tur mutex lock failed with %d", - tur_devt(devt, sizeof(devt), ct), r); + condlog(2, "%s: tur mutex lock failed with %d", ct->devt, r); MSG(c, MSG_TUR_FAILED); return PATH_WILD; } @@ -338,14 +321,12 @@ int libcheck_check(struct checker * c) int running = uatomic_xchg(&ct->running, 0); if (running) pthread_cancel(ct->thread); - condlog(3, "%s: tur checker timeout", - tur_devt(devt, sizeof(devt), ct)); + condlog(3, "%s: tur checker timeout", ct->devt); ct->thread = 0; MSG(c, MSG_TUR_TIMEOUT); tur_status = PATH_TIMEOUT; } else if (uatomic_read(&ct->running) != 0) { - condlog(3, "%s: tur checker not finished", - tur_devt(devt, sizeof(devt), ct)); + condlog(3, "%s: tur checker not finished", ct->devt); tur_status = PATH_PENDING; } else { /* TUR checker done */ @@ -361,7 +342,7 @@ int libcheck_check(struct checker * c) if (ct->state == PATH_PENDING) { pthread_mutex_unlock(&ct->lock); condlog(3, "%s: tur thread not responding", - tur_devt(devt, sizeof(devt), ct)); + ct->devt); return PATH_TIMEOUT; } } @@ -381,7 +362,7 @@ int libcheck_check(struct checker * c) ct->thread = 0; pthread_mutex_unlock(&ct->lock); condlog(3, "%s: failed to start tur thread, using" - " sync mode", tur_devt(devt, sizeof(devt), ct)); + " sync mode", ct->devt); return tur_check(c->fd, c->timeout, copy_msg_to_checker, c); } @@ -392,8 +373,7 @@ int libcheck_check(struct checker * c) pthread_mutex_unlock(&ct->lock); if (uatomic_read(&ct->running) != 0 && (tur_status == PATH_PENDING || tur_status == PATH_UNCHECKED)) { - condlog(3, "%s: tur checker still running", - tur_devt(devt, sizeof(devt), ct)); + condlog(3, "%s: tur checker still running", ct->devt); tur_status = PATH_PENDING; } else { int running = uatomic_xchg(&ct->running, 0);
tur_devt() locks ct->lock. However, it is ocassionally called while ct->lock is already locked. In reality, there is no reason why we need to lock all the accesses to ct->devt. The tur checker only needs to write to this variable one time, when it first gets the file descripter that it is checking. It also never uses ct->devt directly. Instead, it always graps the major and minor, and turns them into a string. This patch changes ct->devt into that string, and only sets it if it hasn't been set before, and no thread is holding the context. After that, this value is only read, and so doesn't need locking. I realize that this locking was added to make an analyzer happy, presumably because sometimes ct->devt was being accessed while ct->devt was held, and sometimes not. The tur checker locking will be cleaned up in a later patch to deal with this. Cc: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> --- libmultipath/checkers/tur.c | 52 ++++++++++++++------------------------------- 1 file changed, 16 insertions(+), 36 deletions(-)