diff mbox

[1/5] btrfs-progs: Fix wrong address accessing by subthread in btrfs-convert

Message ID f664529a996a01479aa3ed14d33a1cf8d92afa8e.1437743468.git.zhaolei@cn.fujitsu.com (mailing list archive)
State Accepted
Headers show

Commit Message

Zhaolei July 24, 2015, 1:16 p.m. UTC
From: Zhao Lei <zhaolei@cn.fujitsu.com>

We can reproduce this bug by a simple script:
  DEV=/dev/vdh
  for ((i = 0; i < 100; i++)); do
      echo "loop $i"
      mkfs.ext4 "$DEV" &>/dev/null || { echo mkfs fail; break; }
      btrfs-convert "$DEV" || break
  done

Result is like:
  loop 0
  ...
  loop 1
  ...
  loop 3
  create btrfs filesystem:
          blocksize: 4096
          nodesize:  16384
          features:  extref, skinny-metadata (default)
  creating btrfs metadata.

  creating ext2fs image file.
  trans 7 running 5
  ctree.c:363: btrfs_cow_block: Assertion `1` failed.
  btrfs-convert(btrfs_cow_block+0x92)[0x40acaf]
  btrfs-convert(btrfs_search_slot+0x1cb)[0x40c50f]
  btrfs-convert(btrfs_csum_file_block+0x20f)[0x41d83a]
  btrfs-convert[0x43422d]
  btrfs-convert[0x4342cd]
  btrfs-convert[0x4345ca]
  btrfs-convert[0x434767]
  btrfs-convert[0x435770]
  btrfs-convert[0x439748]
  btrfs-convert(main+0x13f8)[0x43b09d]
  /lib64/libc.so.6(__libc_start_main+0xfd)[0x335e01ecdd]
  btrfs-convert[0x407649]

Reason is complex:
1: main thread allocated a block of memory,
   shared with sub thread
2: main thread killed sub thread, and free above memory
3: main thread malloc a new one(in same address),
   and use it
4: sub thread(which is not really quit), write into
   this address, and caused this bug.

By adding some debug lines into code, we can see following output:
  create btrfs filesystem:
          blocksize: 4096
          nodesize:  16384
          features:  extref, skinny-metadata (default)
  creating btrfs metadata.
  1:  ctx(0x7ffe1abde230)->info=0xc65b80
  2:  task_period_start: will create periodic.timer_fd
  3:  task_stop: info->periodic.timer_fd = NULL
  4:  task_stop: begin pthread_cancel info->id=-1746053376
  5:  task_stop: done pthread_cancel ret=0
  6:  task_stop: begin info->postfn
  7:  task_period_stop: periodic.timer_fd NULL
  8:  task_stop: done info->postfn
  9:  task_stop: done all
  10: creating ext2fs image file.
  trans 7 running 5
  11: task_period_start: create periodic.timer_fd done info->periodic.timer_fd(0xc65b80)=7
  12: btrfs_cow_block: root->fs_info->generation(0xc63568) = 5 trans->transid(0xc65b80)=7
  13: ctree.c:368: btrfs_cow_block: Assertion `1` failed.
  ./btrfs-convert(btrfs_cow_block+0xda)[0x40ad37]
  ./btrfs-convert(btrfs_search_slot+0x1cb)[0x40c5b4]
  ./btrfs-convert(btrfs_insert_empty_items+0xac)[0x40d9f6]
  ./btrfs-convert(btrfs_record_file_extent+0xc0)[0x4183fe]
  ./btrfs-convert[0x435796]
  ./btrfs-convert[0x439b0c]
  ./btrfs-convert(main+0x13f8)[0x43b45d]
  /lib64/libc.so.6(__libc_start_main+0xfd)[0x335e01ecdd]
  ./btrfs-convert[0x407689]
  Conclusion:
  a: subthread should exit before step 5, but it is still running
     in step 11
  b: task_stop() hadn't close periodic.timer_fd in step3,
     because periodic.timer_fd is not initialized yet.
  c. address of 0xc65b80 is overwrited by subthread in step 11,
     but this address is freed and re-malloc by main thread
     before step 10, and used for trans->transid.
  d: trans->transid which is overwrite by subthread caused error
     in step 13.

Fix:
  pthread_cancel() only send a cancellation request to the thread,
  thread will quit in next cancellation point by default.
  To make sub thread quit in time, this patch add pthread_join()
  after pthread_cancel() call.
  And to make pthread_join() works, pthread_detach() is removed.

Test result:
  Passed 400 times of above script

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
---
 task-utils.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

David Sterba July 24, 2015, 1:39 p.m. UTC | #1
On Fri, Jul 24, 2015 at 09:16:25PM +0800, Zhaolei wrote:
> From: Zhao Lei <zhaolei@cn.fujitsu.com>

Good catch!

> We can reproduce this bug by a simple script:
>   DEV=/dev/vdh
>   for ((i = 0; i < 100; i++)); do
>       echo "loop $i"
>       mkfs.ext4 "$DEV" &>/dev/null || { echo mkfs fail; break; }
>       btrfs-convert "$DEV" || break
>   done

Can you please turn it into a testing script? Just put it into the misc
tests and add the usual preamble and paths. I'll apply this series, so
separate patch please, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/task-utils.c b/task-utils.c
index 10e3f0f..0390a69 100644
--- a/task-utils.c
+++ b/task-utils.c
@@ -50,9 +50,7 @@  int task_start(struct task_info *info)
 	ret = pthread_create(&info->id, NULL, info->threadfn,
 			     info->private_data);
 
-	if (ret == 0)
-		pthread_detach(info->id);
-	else
+	if (ret)
 		info->id = -1;
 
 	return ret;
@@ -66,8 +64,10 @@  void task_stop(struct task_info *info)
 	if (info->periodic.timer_fd)
 		close(info->periodic.timer_fd);
 
-	if (info->id > 0)
+	if (info->id > 0) {
 		pthread_cancel(info->id);
+		pthread_join(info->id, NULL);
+	}
 
 	if (info->postfn)
 		info->postfn(info->private_data);