diff mbox series

block/file-posix: fix the wrong result of find_allocation() in macOS.

Message ID 20180908141528.71260-1-jaywang0.tw@gmail.com (mailing list archive)
State New, archived
Headers show
Series block/file-posix: fix the wrong result of find_allocation() in macOS. | expand

Commit Message

王彥傑 Sept. 8, 2018, 2:15 p.m. UTC
In macOS, lseek with SEEK_DATA behaves differently.
It seeks to the next data region even though offset is in the middle of
a data region. In addition, there may be many data regions without any
hole among them, like this: |---Data---|---Data---|

Because of this, qemu-img convert with raw images as input may create
corrupted images in macOS especially for large files, and qemu-img
map may also report wrong things. This patch fixes this undesired
behaviors.

Signed-off-by: Yan-Jie Wang <jaywang0.tw@gmail.com>
---
 block/file-posix.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)

Comments

Peter Maydell Sept. 8, 2018, 3:34 p.m. UTC | #1
On 8 September 2018 at 15:15, Yan-Jie Wang <jaywang0.tw@gmail.com> wrote:
> In macOS, lseek with SEEK_DATA behaves differently.
> It seeks to the next data region even though offset is in the middle of
> a data region. In addition, there may be many data regions without any
> hole among them, like this: |---Data---|---Data---|
>
> Because of this, qemu-img convert with raw images as input may create
> corrupted images in macOS especially for large files, and qemu-img
> map may also report wrong things. This patch fixes this undesired
> behaviors.

Hi. I have two general questions here:
(1) is this behaviour of SEEK_DATA specific to macOS, or do the
other BSDs (FreeBSD, OpenBSD, NetBSD) also have it ?
(2) is there a way to determine which flavour of SEEK_DATA we
have as a configure-time test rather than having to hardcode
an OS-specific #ifdef ?

thanks
-- PMM
Eric Blake Sept. 10, 2018, 3:10 p.m. UTC | #2
On 9/8/18 10:34 AM, Peter Maydell wrote:
> On 8 September 2018 at 15:15, Yan-Jie Wang <jaywang0.tw@gmail.com> wrote:
>> In macOS, lseek with SEEK_DATA behaves differently.
>> It seeks to the next data region even though offset is in the middle of
>> a data region. In addition, there may be many data regions without any
>> hole among them, like this: |---Data---|---Data---|
>>
>> Because of this, qemu-img convert with raw images as input may create
>> corrupted images in macOS especially for large files, and qemu-img
>> map may also report wrong things. This patch fixes this undesired
>> behaviors.
> 
> Hi. I have two general questions here:
> (1) is this behaviour of SEEK_DATA specific to macOS, or do the
> other BSDs (FreeBSD, OpenBSD, NetBSD) also have it ?

I hope no one else has it, because it is wrong, and should be fixed in 
MacOS to comply with the POSIX recommendation:

http://austingroupbugs.net/view.php?id=415#c862

The commit message ought to call out this link, when stating that the 
patch is a workaround for a buggy syscall.

> (2) is there a way to determine which flavour of SEEK_DATA we
> have as a configure-time test rather than having to hardcode
> an OS-specific #ifdef ?

Rather than hard-coding an OS-specific #ifdef, if we really are stuck 
working around MacOS braindead departure from implementation 
compatibility, then here's what Bruno recommended on the gnulib list:

https://lists.gnu.org/archive/html/bug-gnulib/2018-09/msg00058.html
> 
> How about one of these approaches?
> 
> (a) Put some knowledge about the extent boundaries into the code.
>     I.e. round offset down to the previous extend boundary before the
>     two lseek calls?

Except that I know of no clean way to learn what the actual extent 
boundary of a random offset is going to be.

> 
> (b) Evaluate both
>         d = lseek (fd, SEEK_DATA, offset);
>         h = lseek (fd, SEEK_HOLE, offset);
>     and since you don't know anything about the range from offset
>     to min(d,h)-1, assume it's data.
>     Then, if d < h, you have a data block, or if h < d, you have a hole.
> 

Our code currently optimizes by trying to use only one instead of two 
lseek() calls where possible, but this workaround seems like the most 
viable of any solution without even more syscalls.

> (c) for (int i = 1; i <= 16; i++)
>       {
>         unsigned long o = max (offset - (1 << i), 0);
>         d = lseek (fd, SEEK_DATA, o);
>         h = lseek (fd, SEEK_HOLE, o);
>         if (d < offset || h < offset)
>           break;
>       }

That may work to accomplish the semantics of proposal 1), but it is an 
awful lot of syscalls, and once you've made two, you don't really need 
to make more than two.
王彥傑 Sept. 11, 2018, 2:01 a.m. UTC | #3
This is the program I used to check the behavior of SEEK_DATA and SEEK_HOLE.

#define _FILE_OFFSET_BITS 64
#define _LARGEFILE64_SOURCE 1
#include <sys/types.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <stdint.h>
#include <fcntl.h>
#include <errno.h>
#include <stdio.h>

/*
 * This function is suitable for macOS and Solaris' lseek behavior,
 * since start is always set to the beginning of a data region or a hole.
 * Please see the logics in main()
 */
static int find_allocation(int fd, off_t start, off_t *data, off_t*hole)
{
    off_t offs;

    offs = lseek(fd, start, SEEK_DATA);

    if (offs < 0) {
        return -errno;
    }

    if (offs < start) {
        return -EIO;
    }

    if (offs > start) {
        /* D2: start is at the beginning of hole */
        *hole = start;
        *data = offs;
        return 0;
    }

    /* D1: start is at beginning of data region */
    offs = lseek(fd, start, SEEK_HOLE);
    
    if (offs < 0) {
        return -errno;
    }

    if (offs < start) {
        return -EIO;
    }

    if (offs > start) {
        *data = start;
        *hole = offs;
        return 0;
    }

    return -EIO;
}

int main(int argc, const char* argv[])
{
    off_t first_data = -1, trailing_hole = -1;
    off_t cursor = 0;

    if (argc != 2) {
        fprintf(stderr, "Usage: %s path\n", argv[0]);
        return 1;
    }

    int fd = open(argv[1], O_RDONLY);
    if (fd < 0) {
        perror("Cannot open the file");
        return 1;
    }

    off_t filesize = lseek(fd, 0, SEEK_END);
    if (filesize < 0) {
        perror("Cannot get the file size");
        return 1;
    }

    if (filesize < 2) {
        fprintf(stderr, "Filesize is too small.\n");
        return 1;
    }

    printf ("Filesize: %lld\n", (long long)filesize);

    while(cursor < filesize) {
        off_t data, hole;
        int res;

        /* cursor is always at the beginning of a data region or a hole */
        res = find_allocation(fd, cursor, &data, &hole);
        
        if (res < 0 && res != -ENXIO) {
            fprintf(stderr, "The filesystem or platform being checked does not support SEEK_DATA or SEEK_HOLE.\n");
            perror("");
            return 1;
        }

        if (res == -ENXIO) {
            /* we are at the trailing hole */
            trailing_hole = cursor;
            break;
        }

        if (data == cursor && (hole - data) > 1) {
            /* the length of the data region must be greater than 1. */
            if (first_data == -1) {
                first_data = cursor;
            }
        }

        if (data == cursor) {
            cursor = hole;
        } else {
            cursor = data;
        }
    }

    if (first_data >= 0) {
        printf("Checking for SEEK_DATA by using the data region at %lld... ", (long long)first_data);
        /* first_data plus 1 makes the offset in the middle of a data region */
        errno = 0;
        off_t offs = lseek(fd, first_data + 1, SEEK_DATA);

        if (offs < 0 && errno != ENXIO) {
            printf("Error\n");
            perror(" Msg");
        } else if (errno == ENXIO || offs > first_data + 1) {
            /* offs is set to the next data region. This is macOS's behavior */
            printf("macOS\n");
        } else if (offs == first_data + 1) {
            printf("Linux\n");
        } else {
            printf("Unknown behavior\n");
        }
    } else {
        fprintf(stderr, "There is no data region which is suitable to be checked.\n");
    }

    if (trailing_hole >= 0) {
        printf("Checking for SEEK_HOLE by using the trailing hole at %lld... ", (long long)trailing_hole);
        off_t offs = lseek(fd, trailing_hole, SEEK_HOLE);

        if (offs < 0) {
            printf("Error\n");
            perror(" Msg");
        } else if (offs == filesize) {
            /* offs is set to EOF. This is Solaris' behavior */
            printf("Solaris\n");
        } else if (offs == trailing_hole) {
            printf("Linux\n");
        } else {
            printf("Unknown behavior\n");
        }
    } else {
        fprintf(stderr, "There is no trailing hole which is suitable to be checked.\n");
    }

    return 0;
}



Peter Maydell - 2018/9/8 11:34 PM:
> On 8 September 2018 at 15:15, Yan-Jie Wang <jaywang0.tw@gmail.com> wrote:
>> In macOS, lseek with SEEK_DATA behaves differently.
>> It seeks to the next data region even though offset is in the middle of
>> a data region. In addition, there may be many data regions without any
>> hole among them, like this: |---Data---|---Data---|
>>
>> Because of this, qemu-img convert with raw images as input may create
>> corrupted images in macOS especially for large files, and qemu-img
>> map may also report wrong things. This patch fixes this undesired
>> behaviors.
> 
> Hi. I have two general questions here:
> (1) is this behaviour of SEEK_DATA specific to macOS, or do the
> other BSDs (FreeBSD, OpenBSD, NetBSD) also have it ?
I have installed FreeBSD in Virtualbox and checked the behavior of lseek in FreeBSD.
The behavior of SEEK_DATA is the same as the one in Linux.

> (2) is there a way to determine which flavour of SEEK_DATA we
> have as a configure-time test rather than having to hardcode
> an OS-specific #ifdef ?
macOS can be installed on HFS+ or APFS filesystem. Only APFS supports SEEK_DATA and SEEK_HOLE.

If we try to build qemu on HFS+ filesystem, it is not possible to detect the behavior
of SEEK_DATA and SEEK_HOLE on configure-time.  lseek with SEEK_DATA or SEEK_HOLE returns
errors when the file being checked is on HFS+ filesystem. (I have checked it by formatting
my USB thumb drive to HFS+ filesystem and running the program provided at the top of this email on the file located at the HFS+ filesystem on my thumb drive.)

> 
> thanks
> -- PMM
>
diff mbox series

Patch

diff --git a/block/file-posix.c b/block/file-posix.c
index fe83cbf0eb..5c208580e6 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2325,6 +2325,7 @@  static int find_allocation(BlockDriverState *bs, off_t start,
     BDRVRawState *s = bs->opaque;
     off_t offs;
 
+#if !(defined(__APPLE__) && defined(__MACH__))
     /*
      * SEEK_DATA cases:
      * D1. offs == start: start is in data
@@ -2395,6 +2396,64 @@  static int find_allocation(BlockDriverState *bs, off_t start,
         *hole = offs;
         return 0;
     }
+#else
+    /*
+     * In macOS, lseek with SEEK_DATA seeks to the next data region
+     * even though the offset is in the middle of a data region.
+     * In addition, there may be many data regions without any holes among
+     * them, like this:  |----Data----|----Data----|
+     *
+     * Although the behavior of lseek with SEEK_DATA is different in macOS,
+     * the behavior of lseek with SEEK_HOLE in macOS is the same as the one in
+     * Linux.
+     *
+     * Therefore, the cases D1, D2 and H2 are changed to the followings
+     * for macOS:
+     *  D1. offs == start: start is at the beginning of a data region.
+     *  D2. offs > start: either start is in a hole, next data at offs
+     *      or start is in the middle of a data region,
+     *      next data at offs.
+     *  H2. offs > start: start is in data, next hole at offs
+     */
+
+    offs = lseek(s->fd, start, SEEK_HOLE);
+    if (offs < 0) {
+        return -errno;  /* H3 or H4 */
+    }
+
+    if (offs < start) {
+        /* This is not a valid return by lseek().  We are safe to just return
+         * -EIO in this case, and we'll treat it like D4. */
+        return -EIO;
+    }
+
+    if (offs > start) {
+        /* H2: start is in data, next hole at offs */
+        *data = start;
+        *hole = offs;
+        return 0;
+    }
+
+    /* H1: start is in a hole */
+    offs = lseek(s->fd, start, SEEK_DATA);
+
+    if (offs < 0) {
+        return -errno;  /* H1 and (D3 or D4) */
+    }
+
+    if (offs < start) {
+        /* This is not a valid return by lseek().  We are safe to just return
+         * -EIO in this case, and we'll treat it like D4. */
+        return -EIO;
+    }
+
+    if (offs > start) {
+        /* H1 and D2: start is in a hole, next data at offs */
+        *hole = start;
+        *data = offs;
+        return 0;
+    }
+#endif
 
     /* D1 and H1 */
     return -EBUSY;