diff mbox series

[17/32] libmultipath: fix tur checker double locking

Message ID 1533157038-3924-18-git-send-email-bmarzins@redhat.com (mailing list archive)
State Not Applicable, archived
Delegated to: christophe varoqui
Headers show
Series Misc Multipath patches | expand

Commit Message

Benjamin Marzinski Aug. 1, 2018, 8:57 p.m. UTC
tur_devt() locks ct->lock. However, it is ocassionally called while
ct->lock is already locked. In reality, there is no reason why we need
to lock all the accesses to ct->devt. The tur checker only needs to
write to this variable one time, when it first gets the file descripter
that it is checking.  It also never uses ct->devt directly. Instead, it
always graps the major and minor, and turns them into a string. This
patch changes ct->devt into that string, and only sets it if it hasn't
been set before, and no thread is holding the context. After that,
this value is only read, and so doesn't need locking.

I realize that this locking was added to make an analyzer happy,
presumably because sometimes ct->devt was being accessed while ct->devt
was held, and sometimes not.  The tur checker locking will be cleaned
up in a later patch to deal with this.

Cc: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/checkers/tur.c | 52 ++++++++++++++-------------------------------
 1 file changed, 16 insertions(+), 36 deletions(-)

Comments

Bart Van Assche Aug. 5, 2018, 3:40 p.m. UTC | #1
On Wed, 2018-08-01 at 15:57 -0500, Benjamin Marzinski wrote:
> I realize that this locking was added to make an analyzer happy,
> presumably because sometimes ct->devt was being accessed while ct->devt
> was held, and sometimes not.  The tur checker locking will be cleaned
> up in a later patch to deal with this.

Are you perhaps referring to commit 873be9fef222 ("libmultipath/checkers/tur:
Serialize tur_checker_context.devt accesses")? Your assumption about how DRD
works is wrong. DRD doesn't care about which mutex or other synchronization
primitives are used to serialize accesses to data that is shared between
threads. All it cares about is that concurrent accesses to shared data are
serialized. I'm afraid that your patch reintroduces the race conditions that
were fixed by commit 873be9fef222. Have you considered to fix this without
removing the locking?

Bart.
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Benjamin Marzinski Aug. 6, 2018, 10:55 p.m. UTC | #2
On Sun, Aug 05, 2018 at 03:40:30PM +0000, Bart Van Assche wrote:
> On Wed, 2018-08-01 at 15:57 -0500, Benjamin Marzinski wrote:
> > I realize that this locking was added to make an analyzer happy,
> > presumably because sometimes ct->devt was being accessed while ct->devt
> > was held, and sometimes not.  The tur checker locking will be cleaned
> > up in a later patch to deal with this.
> 
> Are you perhaps referring to commit 873be9fef222 ("libmultipath/checkers/tur:
> Serialize tur_checker_context.devt accesses")? Your assumption about how DRD
> works is wrong. DRD doesn't care about which mutex or other synchronization
> primitives are used to serialize accesses to data that is shared between
> threads. All it cares about is that concurrent accesses to shared data are
> serialized. I'm afraid that your patch reintroduces the race conditions that
> were fixed by commit 873be9fef222. Have you considered to fix this without
> removing the locking?

I'm confused here. If the data is only read by multiple threads, I don't
see why we need to synchronize it. Unless I'm missing something, my
patch does make sure that the only time ct->devt is written is when
there is only one thread running that has access to ct.

-       if (fstat(c->fd, &sb) == 0) {
...
+       if (uatomic_read(&ct->holders) == 1 && ct->devt[0] == 0 &&
+           fstat(c->fd, &sb) == 0) {

holders will always be 2 when a tur checker thread is running (well, the
thread will decrement it when it finishes, but it will not touch ct
after that action). Since only the thread calling libcheck_init can
start the tur checker thread using this context, if there is no thread
running at the start of libcheck_check(), then no thread can be started
until libcheck_check() starts it.

Ideally, ct->devt would be set in libcheck_init(), but we don't have the
information there, so setting it once, on the first time we call
libcheck_check() seems like a reasonable, and safe, answer.  I certainly
prefer it to adding recusive locking where it isn't needed.

-Ben

> 
> Bart.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Bart Van Assche Aug. 7, 2018, 9:45 p.m. UTC | #3
On Mon, 2018-08-06 at 17:55 -0500, Benjamin Marzinski wrote:
> I'm confused here. If the data is only read by multiple threads, I don't
> see why we need to synchronize it. Unless I'm missing something, my
> patch does make sure that the only time ct->devt is written is when
> there is only one thread running that has access to ct.
> 
> -       if (fstat(c->fd, &sb) == 0) {
> ...
> +       if (uatomic_read(&ct->holders) == 1 && ct->devt[0] == 0 &&
> +           fstat(c->fd, &sb) == 0) {
> 
> holders will always be 2 when a tur checker thread is running (well, the
> thread will decrement it when it finishes, but it will not touch ct
> after that action). Since only the thread calling libcheck_init can
> start the tur checker thread using this context, if there is no thread
> running at the start of libcheck_check(), then no thread can be started
> until libcheck_check() starts it.
> 
> Ideally, ct->devt would be set in libcheck_init(), but we don't have the
> information there, so setting it once, on the first time we call
> libcheck_check() seems like a reasonable, and safe, answer.  I certainly
> prefer it to adding recusive locking where it isn't needed.

Replacing a locking scheme into an approach based on atomic variables is
something I don't like because it makes the code harder to analyze for
correctness. Have you considered to keep the locking and fix the bug that
you described in the commit message?

Thanks,

Bart.


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
diff mbox series

Patch

diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c
index 275541f..bfb0907 100644
--- a/libmultipath/checkers/tur.c
+++ b/libmultipath/checkers/tur.c
@@ -37,32 +37,19 @@ 
 #define MSG_TUR_FAILED	"tur checker failed to initialize"
 
 struct tur_checker_context {
-	dev_t devt;
+	char devt[32];
 	int state;
-	int running;
+	int running; /* uatomic access only */
 	int fd;
 	unsigned int timeout;
 	time_t time;
 	pthread_t thread;
 	pthread_mutex_t lock;
 	pthread_cond_t active;
-	int holders;
+	int holders; /* uatomic access only */
 	char message[CHECKER_MSG_LEN];
 };
 
-static const char *tur_devt(char *devt_buf, int size,
-			    struct tur_checker_context *ct)
-{
-	dev_t devt;
-
-	pthread_mutex_lock(&ct->lock);
-	devt = ct->devt;
-	pthread_mutex_unlock(&ct->lock);
-
-	snprintf(devt_buf, size, "%d:%d", major(devt), minor(devt));
-	return devt_buf;
-}
-
 int libcheck_init (struct checker * c)
 {
 	struct tur_checker_context *ct;
@@ -232,14 +219,12 @@  static void *tur_thread(void *ctx)
 {
 	struct tur_checker_context *ct = ctx;
 	int state, running;
-	char devt[32];
 
 	/* This thread can be canceled, so setup clean up */
 	tur_thread_cleanup_push(ct);
 	rcu_register_thread();
 
-	condlog(3, "%s: tur checker starting up",
-		tur_devt(devt, sizeof(devt), ct));
+	condlog(3, "%s: tur checker starting up", ct->devt);
 
 	/* TUR checker start up */
 	pthread_mutex_lock(&ct->lock);
@@ -256,8 +241,8 @@  static void *tur_thread(void *ctx)
 	pthread_cond_signal(&ct->active);
 	pthread_mutex_unlock(&ct->lock);
 
-	condlog(3, "%s: tur checker finished, state %s",
-		tur_devt(devt, sizeof(devt), ct), checker_state_name(state));
+	condlog(3, "%s: tur checker finished, state %s", ct->devt,
+		checker_state_name(state));
 
 	running = uatomic_xchg(&ct->running, 0);
 	if (!running)
@@ -308,15 +293,14 @@  int libcheck_check(struct checker * c)
 	struct stat sb;
 	pthread_attr_t attr;
 	int tur_status, r;
-	char devt[32];
 
 	if (!ct)
 		return PATH_UNCHECKED;
 
-	if (fstat(c->fd, &sb) == 0) {
-		pthread_mutex_lock(&ct->lock);
-		ct->devt = sb.st_rdev;
-		pthread_mutex_unlock(&ct->lock);
+	if (uatomic_read(&ct->holders) == 1 && ct->devt[0] == 0 &&
+	    fstat(c->fd, &sb) == 0) {
+		snprintf(ct->devt, sizeof(ct->devt), "%d:%d", major(sb.st_rdev),
+			 minor(sb.st_rdev));
 	}
 
 	if (c->sync)
@@ -327,8 +311,7 @@  int libcheck_check(struct checker * c)
 	 */
 	r = pthread_mutex_lock(&ct->lock);
 	if (r != 0) {
-		condlog(2, "%s: tur mutex lock failed with %d",
-			tur_devt(devt, sizeof(devt), ct), r);
+		condlog(2, "%s: tur mutex lock failed with %d", ct->devt, r);
 		MSG(c, MSG_TUR_FAILED);
 		return PATH_WILD;
 	}
@@ -338,14 +321,12 @@  int libcheck_check(struct checker * c)
 			int running = uatomic_xchg(&ct->running, 0);
 			if (running)
 				pthread_cancel(ct->thread);
-			condlog(3, "%s: tur checker timeout",
-				tur_devt(devt, sizeof(devt), ct));
+			condlog(3, "%s: tur checker timeout", ct->devt);
 			ct->thread = 0;
 			MSG(c, MSG_TUR_TIMEOUT);
 			tur_status = PATH_TIMEOUT;
 		} else if (uatomic_read(&ct->running) != 0) {
-			condlog(3, "%s: tur checker not finished",
-					tur_devt(devt, sizeof(devt), ct));
+			condlog(3, "%s: tur checker not finished", ct->devt);
 			tur_status = PATH_PENDING;
 		} else {
 			/* TUR checker done */
@@ -361,7 +342,7 @@  int libcheck_check(struct checker * c)
 			if (ct->state == PATH_PENDING) {
 				pthread_mutex_unlock(&ct->lock);
 				condlog(3, "%s: tur thread not responding",
-					tur_devt(devt, sizeof(devt), ct));
+					ct->devt);
 				return PATH_TIMEOUT;
 			}
 		}
@@ -381,7 +362,7 @@  int libcheck_check(struct checker * c)
 			ct->thread = 0;
 			pthread_mutex_unlock(&ct->lock);
 			condlog(3, "%s: failed to start tur thread, using"
-				" sync mode", tur_devt(devt, sizeof(devt), ct));
+				" sync mode", ct->devt);
 			return tur_check(c->fd, c->timeout,
 					 copy_msg_to_checker, c);
 		}
@@ -392,8 +373,7 @@  int libcheck_check(struct checker * c)
 		pthread_mutex_unlock(&ct->lock);
 		if (uatomic_read(&ct->running) != 0 &&
 		    (tur_status == PATH_PENDING || tur_status == PATH_UNCHECKED)) {
-			condlog(3, "%s: tur checker still running",
-				tur_devt(devt, sizeof(devt), ct));
+			condlog(3, "%s: tur checker still running", ct->devt);
 			tur_status = PATH_PENDING;
 		} else {
 			int running = uatomic_xchg(&ct->running, 0);