diff mbox

[01/27] xfs_scrub: create online filesystem scrub program

Message ID 151520349393.2027.11445111828418979100.stgit@magnolia (mailing list archive)
State Superseded
Headers show

Commit Message

Darrick J. Wong Jan. 6, 2018, 1:51 a.m. UTC
From: Darrick J. Wong <darrick.wong@oracle.com>

Create the foundations of a filesystem scrubbing tool that asks the
kernel to inspect all metadata in the filesystem and (ultimately) to
repair anything that's broken.  Also create the man page for the
utility.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 .gitignore                   |    1 
 Makefile                     |    3 +
 man/man8/xfs_scrub.8         |  117 ++++++++++++++++++++++++++++++++++++++++++
 scrub/Makefile               |   42 +++++++++++++++
 scrub/common.c               |   20 +++++++
 scrub/common.h               |   23 ++++++++
 scrub/xfs_scrub.c            |  109 +++++++++++++++++++++++++++++++++++++++
 scrub/xfs_scrub.h            |   23 ++++++++
 tools/find-api-violations.sh |    2 -
 9 files changed, 338 insertions(+), 2 deletions(-)
 create mode 100644 man/man8/xfs_scrub.8
 create mode 100644 scrub/Makefile
 create mode 100644 scrub/common.c
 create mode 100644 scrub/common.h
 create mode 100644 scrub/xfs_scrub.c
 create mode 100644 scrub/xfs_scrub.h



--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eric Sandeen Jan. 12, 2018, 12:16 a.m. UTC | #1
On 1/5/18 7:51 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>

<man page nitpicking>

> diff --git a/man/man8/xfs_scrub.8 b/man/man8/xfs_scrub.8
> new file mode 100644
> index 0000000..95f4fea
> --- /dev/null
> +++ b/man/man8/xfs_scrub.8
> @@ -0,0 +1,117 @@
> +.TH xfs_scrub 8
> +.SH NAME
> +xfs_scrub \- scrub the contents of an XFS filesystem
> +.SH SYNOPSIS
> +.B xfs_scrub
> +[
> +.B \-abemnTvVxy
               ^
> +]
> +.I mount-point

or block device?

> +.br
> +.B xfs_scrub \-V
                  ^

If V is special it probably shouldn't be in the first arg string?

Do you mean to hide the "-d" option?


> +.SH DESCRIPTION
> +.B xfs_scrub
> +attempts to check and repair all metadata in a mounted XFS filesystem.
> +.PP
> +.B xfs_scrub
> +asks the kernel to scrub all metadata objects in the filesystem.
> +Metadata records are scanned for obviously bad values and then
> +cross-referenced against other metadata.
> +The goal is to establish a threasonable confidence about the consistency

"reasonable"

> +of the overall filesystem by examining the consistency of individual
> +metadata records against the other metadata in the filesystem across the
> +entire filesystem.

Redundant, "examining the consistency of individual metadata records against
the other medtadata in the filesystem."  would suffice.

> +Damaged metadata can be rebuilt from other metadata if there is
> +sufficient redundancy (and no other corruption) in the metadata.

Again redundant, maybe just "if there is sufficient redundancy within
other intact metadata?"

> +.PP
> +This utility does not know how to correct all errors.
> +If the tool cannot fix the detected errors, you must unmount the
> +filesystem and run
> +.B xfs_repair
> +to fix the problems.
> +If this tool is not run with either of the
> +.B \-n
> +or
> +.B \-y
> +options, then it will optimize the filesystem when possible,
> +but it will not try to fix errors.

I think the manpage needs to describe what this optimization might
involve, at least at a high level.  Will it fsr all my files? Will
it trim my free space?  Will it compact my directories?  Will it ...?
What exactly am I agreeing to here? :)

> +.SH OPTIONS
> +.TP
> +.BI \-a " errors"
> +Abort if more than this many errors are found on the filesystem.
> +.TP
> +.B \-b
> +Run in background mode.
> +If the option is specified once, only run a single scrubbing thread at a
> +time.
> +If given more than once, an artificial delay of 100us is added to each
> +scrub call to reduce CPU overhead even further.

I wonder, should it take a value instead of -bbbbbbbbb?

> +.TP
> +.B \-e
> +Specifies what happens when errors are detected.
> +If
> +.IR shutdown
> +is given, the filesystem will be taken offline if errors are found.
> +Not all backends can shut down a filesystem.

<user> what's a backend? </user>

> +If
> +.IR continue
> +is given, no action taken if errors are found.
> +This is the default.

<user> so how do I know what errors were found? </user>

> +.TP
> +.BI \-m " file"
> +Search this file for mounted filesystems instead of /etc/mtab.
> +.TP
> +.B \-n
> +Dry run, do not modify anything in the filesystem.
> +This disables all preening and optimization behaviors, and disables
> +calling FITRIM on the free space after a successful run.

what if I only want to disable FITRIM?  (-k?)
Oh, and it runs FITRIM?  Can you mention that more prominently
in the behavior description?  (and should it, given that we
have a tool for that purpose?)

> +.TP
> +.BI \-T
> +Print timing and memory usage information for each phase.
> +.TP
> +.B \-v
> +Enable verbose mode, which prints periodic status updates.
> +.TP
> +.B \-V
> +Prints the version number and exits.
> +.TP
> +.B \-x
> +Scrub all file data too.

colloquial?  maybe s/too/as well/ 

> +The block list will be sorted in disk order for better performance.

Cool, so when I'm done, my filesystem will have better performance if I use -x?
and none of my files will be corrupted!  ;)

The read order is probably an implementation detail that doesn't need to be in
the manpage.  It may be worth changing the description a bit to make it
clearer that the purpose is to determine readability of every file block?
I mean, that should probably be obvious, but ...

> +.B xfs_scrub
> +will issue O_DIRECT reads to the block device directly.
> +If the block device is a SCSI disk, it will issue READ VERIFY commands
> +directly to the disk.

+ These actions will confirm that all file data blocks can be read from storage.

or something?

> +.TP
> +.B \-y
> +Try to repair all filesystem errors.
> +If the errors cannot be fixed online, then the filesystem must be taken
> +offline for repair.
> +.SH EXIT CODE
> +The exit code returned by
> +.B xfs_scrub
> +is the sum of the following conditions:
> +.br
> +\	0\	\-\ No errors
> +.br
> +\	1\	\-\ File system errors left uncorrected
> +.br
> +\	2\	\-\ File system optimizations possible
> +.br
> +\	4\	\-\ Operational error
> +.br
> +\	8\	\-\ Usage or syntax error
> +.br
> +.SH CAVEATS
> +.B xfs_scrub
> +is an immature utility!

Might it damage my filesystem? ;)

> +This program takes advantage of in-kernel scrubbing to verify a given
> +data structure with locks held.
> +The kernel must support the BULKSTAT, FSGEOMETRY, FSCOUNTS, GET_RESBLKS,
> +GETBMAPX, GETFSMAP, INUMBERS, and SCRUB_METADATA ioctls.

Some of those ioctls are ancient and probably don't need to be specified...
Can you do anything at all without SCRUB_METADATA?  If not,
is SCRUB_METADATA sufficient to determine that the kernel has the rest
of what it needs?

> +This can tie up the system for a while.

Maybe that's a statement to go right after "locks held"

> +.PP
> +If errors are found and cannot be repaired, the filesystem must be taken
> +offline and repaired.

"unmounted and repaired" might be more specific?  *shrug*

> +.SH SEE ALSO
> +.BR xfs_repair (8).


--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen Jan. 12, 2018, 1:07 a.m. UTC | #2
On 1/5/18 7:51 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Create the foundations of a filesystem scrubbing tool that asks the
> kernel to inspect all metadata in the filesystem and (ultimately) to
> repair anything that's broken.  Also create the man page for the
> utility.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

...

> +/*
> + * XFS Online Metadata Scrub (and Repair)
> + *
> + * The XFS scrubber uses custom XFS ioctls to probe more deeply into the
> + * internals of the filesystem.  It takes advantage of scrubbing ioctls
> + * to check all the records stored in a metadata object and to
> + * cross-reference those records against the other filesystem metadata.
> + *
> + * After the program gathers command line arguments to figure out
> + * exactly what the user wants the program is going to do, scrub

* exactly what the user wants the program to do

or -

* exactly what the program is going to do

or -

* exactly what the user wants to do

:)

> + * execution is split up into several separate phases:
> + *
> + * The "find geometry" phase queries XFS for the filesystem geometry.
> + * The block devices for the data, realtime, and log devices are opened.
> + * Kernel ioctls are test-queried to see if they actually work (the scrub
> + * ioctl in particular), and any other filesystem-specific information
> + * is gathered.
> + *
> + * In the "check internal metadata" phase, we call the metadata scrub
> + * ioctl to check the filesystem's internal per-AG btrees.  This
> + * includes the AG superblock, AGF, AGFL, and AGI headers, freespace
> + * btrees, the regular and free inode btrees, the reverse mapping
> + * btrees, and the reference counting btrees.  If the realtime device is
> + * enabled, the realtime bitmap and reverse mapping btrees are enabled.

checked?

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick J. Wong Jan. 12, 2018, 1:08 a.m. UTC | #3
On Thu, Jan 11, 2018 at 06:16:02PM -0600, Eric Sandeen wrote:
> On 1/5/18 7:51 PM, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> <man page nitpicking>
> 
> > diff --git a/man/man8/xfs_scrub.8 b/man/man8/xfs_scrub.8
> > new file mode 100644
> > index 0000000..95f4fea
> > --- /dev/null
> > +++ b/man/man8/xfs_scrub.8
> > @@ -0,0 +1,117 @@
> > +.TH xfs_scrub 8
> > +.SH NAME
> > +xfs_scrub \- scrub the contents of an XFS filesystem
> > +.SH SYNOPSIS
> > +.B xfs_scrub
> > +[
> > +.B \-abemnTvVxy
>                ^
> > +]
> > +.I mount-point
> 
> or block device?
> 
> > +.br
> > +.B xfs_scrub \-V
>                   ^
> 
> If V is special it probably shouldn't be in the first arg string?

Yes, fixed.

> Do you mean to hide the "-d" option?

-d turn on debug mode; I was going to keep that hidden from users.

> 
> > +.SH DESCRIPTION
> > +.B xfs_scrub
> > +attempts to check and repair all metadata in a mounted XFS filesystem.
> > +.PP
> > +.B xfs_scrub
> > +asks the kernel to scrub all metadata objects in the filesystem.
> > +Metadata records are scanned for obviously bad values and then
> > +cross-referenced against other metadata.
> > +The goal is to establish a threasonable confidence about the consistency
> 
> "reasonable"

Fixed.

> > +of the overall filesystem by examining the consistency of individual
> > +metadata records against the other metadata in the filesystem across the
> > +entire filesystem.
> 
> Redundant, "examining the consistency of individual metadata records against
> the other medtadata in the filesystem."  would suffice.

Fixed.

> > +Damaged metadata can be rebuilt from other metadata if there is
> > +sufficient redundancy (and no other corruption) in the metadata.
> 
> Again redundant, maybe just "if there is sufficient redundancy within
> other intact metadata?"

"Damaged metadata can be rebuilt from other metadata if there exists
redundant data structures which are intact."

?

> > +.PP
> > +This utility does not know how to correct all errors.
> > +If the tool cannot fix the detected errors, you must unmount the
> > +filesystem and run
> > +.B xfs_repair
> > +to fix the problems.
> > +If this tool is not run with either of the
> > +.B \-n
> > +or
> > +.B \-y
> > +options, then it will optimize the filesystem when possible,
> > +but it will not try to fix errors.
> 
> I think the manpage needs to describe what this optimization might
> involve, at least at a high level.  Will it fsr all my files? Will
> it trim my free space?  Will it compact my directories?  Will it ...?
> What exactly am I agreeing to here? :)

"Optimizations may include, but are not limited to, activities such as
compacting metadata or bypassing shared block write checks for files
that no longer share blocks."

> > +.SH OPTIONS
> > +.TP
> > +.BI \-a " errors"
> > +Abort if more than this many errors are found on the filesystem.
> > +.TP
> > +.B \-b
> > +Run in background mode.
> > +If the option is specified once, only run a single scrubbing thread at a
> > +time.
> > +If given more than once, an artificial delay of 100us is added to each
> > +scrub call to reduce CPU overhead even further.
> 
> I wonder, should it take a value instead of -bbbbbbbbb?

More than ten -b and this program gets reallllly slow.  There are
currently six global fs checks, ten per-AG checks, and seven per-file
checks.  On my /home filesystem with 4M inodes and 32 AGs that adds up
to...

6 + (32 * 10) + (4M * 7) == ~28M scrub calls, or 324 days to perform
a scan.

> > +.TP
> > +.B \-e
> > +Specifies what happens when errors are detected.
> > +If
> > +.IR shutdown
> > +is given, the filesystem will be taken offline if errors are found.
> > +Not all backends can shut down a filesystem.
> 
> <user> what's a backend? </user>

Leftover remnant from the days when this was a frankentool that could be
used to walk filesystems via the standard interfaces.  I removed this
sentence.

> > +If
> > +.IR continue
> > +is given, no action taken if errors are found.
> > +This is the default.
> 
> <user> so how do I know what errors were found? </user>

"Filesystem corruption and optimization opportunities will be logged to
the standard error stream."

I'll put that at the top.

> > +.TP
> > +.BI \-m " file"
> > +Search this file for mounted filesystems instead of /etc/mtab.
> > +.TP
> > +.B \-n
> > +Dry run, do not modify anything in the filesystem.
> > +This disables all preening and optimization behaviors, and disables
> > +calling FITRIM on the free space after a successful run.
> 
> what if I only want to disable FITRIM?  (-k?)

Oh all right. :)

> Oh, and it runs FITRIM?  Can you mention that more prominently
> in the behavior description?

I'll put it in the list of optimizations.

> (and should it, given that we have a tool for that purpose?)

Yes we have fstrim but I consider it too scary to run out of the
blue without checking the health of the free space info first.

> > +.TP
> > +.BI \-T
> > +Print timing and memory usage information for each phase.
> > +.TP
> > +.B \-v
> > +Enable verbose mode, which prints periodic status updates.
> > +.TP
> > +.B \-V
> > +Prints the version number and exits.
> > +.TP
> > +.B \-x
> > +Scrub all file data too.
> 
> colloquial?  maybe s/too/as well/ 

"Read all file data extents to look for disk errors."

> > +The block list will be sorted in disk order for better performance.
> 
> Cool, so when I'm done, my filesystem will have better performance if I use -x?
> and none of my files will be corrupted!  ;)
> 
> The read order is probably an implementation detail that doesn't need to be in
> the manpage.  It may be worth changing the description a bit to make it
> clearer that the purpose is to determine readability of every file block?
> I mean, that should probably be obvious, but ...

Eh, I'll just remove it.

> > +.B xfs_scrub
> > +will issue O_DIRECT reads to the block device directly.
> > +If the block device is a SCSI disk, it will issue READ VERIFY commands
> > +directly to the disk.
> 
> + These actions will confirm that all file data blocks can be read from storage.
> 
> or something?

Ok, added that verbatim.

> > +.TP
> > +.B \-y
> > +Try to repair all filesystem errors.
> > +If the errors cannot be fixed online, then the filesystem must be taken
> > +offline for repair.
> > +.SH EXIT CODE
> > +The exit code returned by
> > +.B xfs_scrub
> > +is the sum of the following conditions:
> > +.br
> > +\	0\	\-\ No errors
> > +.br
> > +\	1\	\-\ File system errors left uncorrected
> > +.br
> > +\	2\	\-\ File system optimizations possible
> > +.br
> > +\	4\	\-\ Operational error
> > +.br
> > +\	8\	\-\ Usage or syntax error
> > +.br
> > +.SH CAVEATS
> > +.B xfs_scrub
> > +is an immature utility!
> 
> Might it damage my filesystem? ;)

It glides as softly as a piston!




...oh, are we not doing the monorail song?

> > +This program takes advantage of in-kernel scrubbing to verify a given
> > +data structure with locks held.

"This program takes advantage of in-kernel scrubbing to verify a given
data structure with locks held and can keep the filesystem busy for a
long time."

> > +The kernel must support the BULKSTAT, FSGEOMETRY, FSCOUNTS, GET_RESBLKS,
> > +GETBMAPX, GETFSMAP, INUMBERS, and SCRUB_METADATA ioctls.
> 
> Some of those ioctls are ancient and probably don't need to be specified...
> Can you do anything at all without SCRUB_METADATA?  If not,
> is SCRUB_METADATA sufficient to determine that the kernel has the rest
> of what it needs?

SCRUB_METADATA is enough, provided we don't get kernel-tinyfication'd.

> > +This can tie up the system for a while.
> 
> Maybe that's a statement to go right after "locks held"

Ok.

> > +.PP
> > +If errors are found and cannot be repaired, the filesystem must be taken
> > +offline and repaired.
> 
> "unmounted and repaired" might be more specific?  *shrug*

Ok.

--D

> > +.SH SEE ALSO
> > +.BR xfs_repair (8).
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick J. Wong Jan. 12, 2018, 1:10 a.m. UTC | #4
On Thu, Jan 11, 2018 at 07:07:43PM -0600, Eric Sandeen wrote:
> 
> 
> On 1/5/18 7:51 PM, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Create the foundations of a filesystem scrubbing tool that asks the
> > kernel to inspect all metadata in the filesystem and (ultimately) to
> > repair anything that's broken.  Also create the man page for the
> > utility.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> ...
> 
> > +/*
> > + * XFS Online Metadata Scrub (and Repair)
> > + *
> > + * The XFS scrubber uses custom XFS ioctls to probe more deeply into the
> > + * internals of the filesystem.  It takes advantage of scrubbing ioctls
> > + * to check all the records stored in a metadata object and to
> > + * cross-reference those records against the other filesystem metadata.
> > + *
> > + * After the program gathers command line arguments to figure out
> > + * exactly what the user wants the program is going to do, scrub
> 
> * exactly what the user wants the program to do
> 
> or -
> 
> * exactly what the program is going to do
> 
> or -
> 
> * exactly what the user wants to do
> 
> :)

The second.  The program can figure out what the program is going to do;
it has no idea what the user wants.

> > + * execution is split up into several separate phases:
> > + *
> > + * The "find geometry" phase queries XFS for the filesystem geometry.
> > + * The block devices for the data, realtime, and log devices are opened.
> > + * Kernel ioctls are test-queried to see if they actually work (the scrub
> > + * ioctl in particular), and any other filesystem-specific information
> > + * is gathered.
> > + *
> > + * In the "check internal metadata" phase, we call the metadata scrub
> > + * ioctl to check the filesystem's internal per-AG btrees.  This
> > + * includes the AG superblock, AGF, AGFL, and AGI headers, freespace
> > + * btrees, the regular and free inode btrees, the reverse mapping
> > + * btrees, and the reference counting btrees.  If the realtime device is
> > + * enabled, the realtime bitmap and reverse mapping btrees are enabled.
> 
> checked?

Fixed.

--D

> -Eric
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/.gitignore b/.gitignore
index e839e2a..a3db640 100644
--- a/.gitignore
+++ b/.gitignore
@@ -68,6 +68,7 @@  cscope.*
 /repair/xfs_repair
 /rtcp/xfs_rtcp
 /spaceman/xfs_spaceman
+/scrub/xfs_scrub
 
 # generated crc files
 /libxfs/crc32selftest
diff --git a/Makefile b/Makefile
index 0dce80a..3bd0796 100644
--- a/Makefile
+++ b/Makefile
@@ -48,7 +48,7 @@  LIBFROG_SUBDIR = libfrog
 DLIB_SUBDIRS = libxlog libxcmd libhandle
 LIB_SUBDIRS = libxfs $(DLIB_SUBDIRS)
 TOOL_SUBDIRS = copy db estimate fsck growfs io logprint mkfs quota \
-		mdrestore repair rtcp m4 man doc debian spaceman
+		mdrestore repair rtcp m4 man doc debian spaceman scrub
 
 ifneq ("$(PKG_PLATFORM)","darwin")
 TOOL_SUBDIRS += fsr
@@ -91,6 +91,7 @@  repair: libxlog libxcmd
 copy: libxlog
 mkfs: libxcmd
 spaceman: libxcmd
+scrub: libhandle libxcmd
 
 ifeq ($(HAVE_BUILDDEFS), yes)
 include $(BUILDRULES)
diff --git a/man/man8/xfs_scrub.8 b/man/man8/xfs_scrub.8
new file mode 100644
index 0000000..95f4fea
--- /dev/null
+++ b/man/man8/xfs_scrub.8
@@ -0,0 +1,117 @@ 
+.TH xfs_scrub 8
+.SH NAME
+xfs_scrub \- scrub the contents of an XFS filesystem
+.SH SYNOPSIS
+.B xfs_scrub
+[
+.B \-abemnTvVxy
+]
+.I mount-point
+.br
+.B xfs_scrub \-V
+.SH DESCRIPTION
+.B xfs_scrub
+attempts to check and repair all metadata in a mounted XFS filesystem.
+.PP
+.B xfs_scrub
+asks the kernel to scrub all metadata objects in the filesystem.
+Metadata records are scanned for obviously bad values and then
+cross-referenced against other metadata.
+The goal is to establish a threasonable confidence about the consistency
+of the overall filesystem by examining the consistency of individual
+metadata records against the other metadata in the filesystem across the
+entire filesystem.
+Damaged metadata can be rebuilt from other metadata if there is
+sufficient redundancy (and no other corruption) in the metadata.
+.PP
+This utility does not know how to correct all errors.
+If the tool cannot fix the detected errors, you must unmount the
+filesystem and run
+.B xfs_repair
+to fix the problems.
+If this tool is not run with either of the
+.B \-n
+or
+.B \-y
+options, then it will optimize the filesystem when possible,
+but it will not try to fix errors.
+.SH OPTIONS
+.TP
+.BI \-a " errors"
+Abort if more than this many errors are found on the filesystem.
+.TP
+.B \-b
+Run in background mode.
+If the option is specified once, only run a single scrubbing thread at a
+time.
+If given more than once, an artificial delay of 100us is added to each
+scrub call to reduce CPU overhead even further.
+.TP
+.B \-e
+Specifies what happens when errors are detected.
+If
+.IR shutdown
+is given, the filesystem will be taken offline if errors are found.
+Not all backends can shut down a filesystem.
+If
+.IR continue
+is given, no action taken if errors are found.
+This is the default.
+.TP
+.BI \-m " file"
+Search this file for mounted filesystems instead of /etc/mtab.
+.TP
+.B \-n
+Dry run, do not modify anything in the filesystem.
+This disables all preening and optimization behaviors, and disables
+calling FITRIM on the free space after a successful run.
+.TP
+.BI \-T
+Print timing and memory usage information for each phase.
+.TP
+.B \-v
+Enable verbose mode, which prints periodic status updates.
+.TP
+.B \-V
+Prints the version number and exits.
+.TP
+.B \-x
+Scrub all file data too.
+The block list will be sorted in disk order for better performance.
+.B xfs_scrub
+will issue O_DIRECT reads to the block device directly.
+If the block device is a SCSI disk, it will issue READ VERIFY commands
+directly to the disk.
+.TP
+.B \-y
+Try to repair all filesystem errors.
+If the errors cannot be fixed online, then the filesystem must be taken
+offline for repair.
+.SH EXIT CODE
+The exit code returned by
+.B xfs_scrub
+is the sum of the following conditions:
+.br
+\	0\	\-\ No errors
+.br
+\	1\	\-\ File system errors left uncorrected
+.br
+\	2\	\-\ File system optimizations possible
+.br
+\	4\	\-\ Operational error
+.br
+\	8\	\-\ Usage or syntax error
+.br
+.SH CAVEATS
+.B xfs_scrub
+is an immature utility!
+This program takes advantage of in-kernel scrubbing to verify a given
+data structure with locks held.
+The kernel must support the BULKSTAT, FSGEOMETRY, FSCOUNTS, GET_RESBLKS,
+GETBMAPX, GETFSMAP, INUMBERS, and SCRUB_METADATA ioctls.
+This can tie up the system for a while.
+.PP
+If errors are found and cannot be repaired, the filesystem must be taken
+offline and repaired.
+.SH SEE ALSO
+.BR xfs_repair (8).
diff --git a/scrub/Makefile b/scrub/Makefile
new file mode 100644
index 0000000..62cca3b
--- /dev/null
+++ b/scrub/Makefile
@@ -0,0 +1,42 @@ 
+#
+# Copyright (C) 2018 Oracle.  All Rights Reserved.
+#
+
+TOPDIR = ..
+include $(TOPDIR)/include/builddefs
+
+# On linux we get fsmap from the system or define it ourselves
+# so include this based on platform type.  If this reverts to only
+# the autoconf check w/o local definition, change to testing HAVE_GETFSMAP
+SCRUB_PREREQS=$(PKG_PLATFORM)
+
+ifeq ($(SCRUB_PREREQS),linux)
+LTCOMMAND = xfs_scrub
+INSTALL_SCRUB = install-scrub
+endif	# scrub_prereqs
+
+HFILES = \
+common.h \
+xfs_scrub.h
+
+CFILES = \
+common.c \
+xfs_scrub.c
+
+LLDLIBS += $(LIBHANDLE) $(LIBFROG) $(LIBPTHREAD)
+LTDEPENDENCIES += $(LIBHANDLE) $(LIBFROG)
+LLDFLAGS = -static
+
+default: depend $(LTCOMMAND)
+
+include $(BUILDRULES)
+
+install: default $(INSTALL_SCRUB)
+
+install-scrub:
+	$(INSTALL) -m 755 -d $(PKG_ROOT_SBIN_DIR)
+	$(LTINSTALL) -m 755 $(LTCOMMAND) $(PKG_ROOT_SBIN_DIR)
+
+install-dev:
+
+-include .dep
diff --git a/scrub/common.c b/scrub/common.c
new file mode 100644
index 0000000..0a58c16
--- /dev/null
+++ b/scrub/common.c
@@ -0,0 +1,20 @@ 
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "common.h"
diff --git a/scrub/common.h b/scrub/common.h
new file mode 100644
index 0000000..1082296
--- /dev/null
+++ b/scrub/common.h
@@ -0,0 +1,23 @@ 
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef XFS_SCRUB_COMMON_H_
+#define XFS_SCRUB_COMMON_H_
+
+#endif /* XFS_SCRUB_COMMON_H_ */
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
new file mode 100644
index 0000000..4f26855
--- /dev/null
+++ b/scrub/xfs_scrub.c
@@ -0,0 +1,109 @@ 
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include "xfs_scrub.h"
+
+/*
+ * XFS Online Metadata Scrub (and Repair)
+ *
+ * The XFS scrubber uses custom XFS ioctls to probe more deeply into the
+ * internals of the filesystem.  It takes advantage of scrubbing ioctls
+ * to check all the records stored in a metadata object and to
+ * cross-reference those records against the other filesystem metadata.
+ *
+ * After the program gathers command line arguments to figure out
+ * exactly what the user wants the program is going to do, scrub
+ * execution is split up into several separate phases:
+ *
+ * The "find geometry" phase queries XFS for the filesystem geometry.
+ * The block devices for the data, realtime, and log devices are opened.
+ * Kernel ioctls are test-queried to see if they actually work (the scrub
+ * ioctl in particular), and any other filesystem-specific information
+ * is gathered.
+ *
+ * In the "check internal metadata" phase, we call the metadata scrub
+ * ioctl to check the filesystem's internal per-AG btrees.  This
+ * includes the AG superblock, AGF, AGFL, and AGI headers, freespace
+ * btrees, the regular and free inode btrees, the reverse mapping
+ * btrees, and the reference counting btrees.  If the realtime device is
+ * enabled, the realtime bitmap and reverse mapping btrees are enabled.
+ * Quotas, if enabled, are also checked in this phase.
+ *
+ * Each AG (and the realtime device) has its metadata checked in a
+ * separate thread for better performance.  Errors in the internal
+ * metadata can be fixed here prior to the inode scan; refer to the
+ * section about the "repair filesystem" phase for more information.
+ *
+ * The "scan all inodes" phase uses BULKSTAT to scan all the inodes in
+ * an AG in disk order.  The BULKSTAT information provides enough
+ * information to construct a file handle that is used to check the
+ * following parts of every file:
+ *
+ *  - The inode record
+ *  - All three block forks (data, attr, CoW)
+ *  - If it's a symlink, the symlink target.
+ *  - If it's a directory, the directory entries.
+ *  - All extended attributes
+ *  - The parent pointer
+ *
+ * Multiple threads are started to check each the inodes of each AG in
+ * parallel.  Errors in file metadata can be fixed here; see the section
+ * about the "repair filesystem" phase for more information.
+ *
+ * Next comes the (configurable) "repair filesystem" phase.  The user
+ * can instruct this program to fix all problems encountered; to fix
+ * only optimality problems and leave the corruptions; or not to touch
+ * the filesystem at all.  Any metadata repairs that did not succeed in
+ * the previous two phases are retried here; if there are uncorrectable
+ * errors, xfs_scrub stops here.
+ *
+ * The next phase is the "check directory tree" phase.  In this phase,
+ * every directory is opened (via file handle) to confirm that each
+ * directory is connected to the root.  Directory entries are checked
+ * for ambiguous Unicode normalization mappings, which is to say that we
+ * look for pairs of entries whose utf-8 strings normalize to the same
+ * code point sequence and map to different inodes, because that could
+ * be used to trick a user into opening the wrong file.  The names of
+ * extended attributes are checked for Unicode normalization collisions.
+ *
+ * In the "verify data file integrity" phase, we employ GETFSMAP to read
+ * the reverse-mappings of all AGs and issue direct-reads of the
+ * underlying disk blocks.  We rely on the underlying storage to have
+ * checksummed the data blocks appropriately.  Multiple threads are
+ * started to check each AG in parallel; a separate thread pool is used
+ * to handle the direct reads.
+ *
+ * In the "check summary counters" phase, use GETFSMAP to tally up the
+ * blocks and BULKSTAT to tally up the inodes we saw and compare that to
+ * the statfs output.  This gives the user a rough estimate of how
+ * thorough the scrub was.
+ */
+
+/* Program name; needed for libxcmd error reports. */
+char				*progname = "xfs_scrub";
+
+int
+main(
+	int			argc,
+	char			**argv)
+{
+	fprintf(stderr, "XXX: This program is not complete!\n");
+	return 4;
+}
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
new file mode 100644
index 0000000..ff9c24d
--- /dev/null
+++ b/scrub/xfs_scrub.h
@@ -0,0 +1,23 @@ 
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef XFS_SCRUB_XFS_SCRUB_H_
+#define XFS_SCRUB_XFS_SCRUB_H_
+
+#endif /* XFS_SCRUB_XFS_SCRUB_H_ */
diff --git a/tools/find-api-violations.sh b/tools/find-api-violations.sh
index 3b976d3..cb075ba 100755
--- a/tools/find-api-violations.sh
+++ b/tools/find-api-violations.sh
@@ -6,7 +6,7 @@ 
 
 # NOTE: This script doesn't look for API violations in function parameters.
 
-tool_dirs="copy db estimate fsck fsr growfs io logprint mdrestore mkfs quota repair rtcp"
+tool_dirs="copy db estimate fsck fsr growfs io logprint mdrestore mkfs quota repair rtcp scrub"
 
 # Calls to xfs_* functions in libxfs/*.c without the libxfs_ prefix
 find_possible_api_calls() {