static site checker

content

introduction
why
README
usage
known issues
bug reporting
build
download
boot notes
copyright & licence


introduction

The static site checker is an opinionated HTML nitpicker, a command–line tool to validate static HTML & XHTML websites. I built it to nitpick arts & ego, my hand–coded identity website.

It should not be used on untrusted content; its parsers are holier than Robin’s cow.

If you want to try it, here’s the current source. The build instructions follow.

Dylan Harris
October 2024


why SSC

Why did I make the static site checker? Aren’t there a lot of other HTML validators around? When I checked a few years ago; I couldn’t find a web site validator, only web page validators. Things may have improved. Anyway, my google foo is poo.

My identity website has more than 100,000 pages. I’m too impatient to push each through a validator one by one; I want to validate my site as a whole. Furthermore, single page validators can’t catch inter–page errors, such as broken internal links, let alone hidden links (an otherwise valid link to a HIDDEN element).

Many people avoid such problems by using frameworks. I find frameworks awful. IMAO, they produce dull, boring, trite design. The visual arts world has had centuries to develop excellent form for a rectangular space. Most 21st frameworks are so crude they haven’t even absorbed 14th century visual arts’ ideas, when painters broke out of rectangular form in a rectangular frame. So much is possible, so much hasn’t happened. I want to break this dull, stultifying, archaic, mutton.

Maybe I’m making the wrong comparison, that the web isn’t about image, it’s about type. The Western visual arts never did really suss mixing writing and form (that’s not really true, but, IMAO, such arts never escaped their context). However, the Eastern visual arts most certainly did, and frameworks haven’t noticed them either.

Enough of this. Rather than criticising other people for not doing, I should do. I need to make some example sites. That’s where SSC comes in.

If I am to build a site using an experimental visual process, I can’t use frameworks. If I can’t use frameworks, I have to hand code. And there’s the key problem: HTML is such a convoluted, evolved mess, that the people who designed it, in their own design presentations, make errors. Ok, I only found this out by testing SSC on them, which conveniently illustrates that HTML is overcomplicated. I’m not going to reveal names because these people are working hard to make the web a better place. Let’s just say W3 had broken links, WhatWG referenced withdrawn ontologies, and many other authors’ sites have other internal inconsistencies. That the people who define the web make mistakes using their own design in their own documents that espouse their design, helps explain why most stick to dull, formulaic, boring, frameworks. To be fair, my HTML is far worse than any of these mild examples of technical naughtiness, which is why I had to write SSC.

I’ve yet to build a site inspired by visual art’s form and layout. My efforts have been spent building SSC, a tool to make that practical.

Since I’m here, I’ll list other issues I have with frameworks:

Dylan Harris
January 2024


README

Static Site Checker
(an opinionated HTML nitpicker)
version 0.2.4
(c) 2020-2024 dylan harris
see LICENCE.txt & LICENSE.txt for copyright & licence notice
https://ssc.lu/
https://github.com/devongarde/ssc



ssc analyses static HTML snippets, files and sites:
— HTML living standard, Jan 2005 to Oct 2024
— HTML 1.0/+/2.0/3.0/3.2/4.00/4.01/5.0/5.1/5.2/5.3–draft
— CSS 1/2.0/2.1/2.2–draft, 2007-2023 snapshots, more
— SVG 1.0/1.1/1.2 Tiny/1.2 Full/2.0/2.x–draft Apr 2021
— MathML 1/2/3/4–draft Jul 2022
— XHTML 1.0/1.1/2.0/5.x
— finds broken links
— server side includes, mostly
— many ontologies

with opinions on:
— standard english where dialect is required
— perfectly legal but sloppy HTML
— abhorrent rudeness such as autoplay on videos

It does NOT:
— analyse or understand scripts
— analyse or understand XML or derivatives, except as noted above

It can output:
— ‘repaired’ HTML (not XHTML)
— HTML with resolved server side includes
— JSON of ontological content
— website statistical information
— deduplicated websites



ssc -h
for a usage summary.

ssc -f config_file
analyse site using preprepared configuration

ssc directory
analyse website based in directory



To build & run:
1. Follow the build instructions in build.txt
2. Gleefully run ssc. It will misbehave if you are insufficiently gleeful.



This is an alpha version of ssc. It may contain unexpected features.
If you encounter such a delight, please help improve ssc by collecting
the following information (where relevant):
— version of ssc;
— precise version of the operating system;
— hardware architecture and system information;
— detailed description of the problem;
— detailed description of the steps to recreate it;
— copy of output file showing the error;
— copy of pages/website being analysed;
— precise command used;
— configuration file(s) used, if any;
— any ndx file or other pre–existing file used during the run;
— any known workarounds or solutions;
— optionally, a dance interpretation of the ‘feature’;
and emailing everything to mail@ssc.lu (if the collected files are more
than small, please use a public fileserver and email the link). Do NOT
send anything confidential. Furthermore, unless you state otherwise,
we reserve the right to publish some or all of the information sent in
future versions of ssc, usually in the test suite. If you have a fix,
you are invited to submit a pull request on github. Thank you.



SSC can be run in a CGI environment. This is intended for use with OpenBSD’s native httpd web server.
You are reminded that SSC is α software. Do NOT expose it to untrusted data
sources, such as the open web, without taking serious precautions. SSC probably has
more bugs than the Creator’s Ultimate All–Beetle Extravaganza (J.B.S.
Haldane, apocryphal : “[the Creator has] an inordinate fondness for beetles.”).



Notes on names:
— recipe: a nod to Vernor Vinge’s “A Fire Upon the Deep”;
— tea: without tea, nothing works; then there’s builders’ tea;
— sauce: makes the dull tasty; identifies incompetent pedants;
— toast: toasts code; i liked burnt toast;
— heater: i’m not stopping now;
— unii: my preferred plural of unix: to my ears, both unixes and unices
        sound like they sing castrato.
— andor: and/or sans ancienne; land of Gift (aber nicht das Gift)


SEE ALSO
build.txt        notes on building ssc
gen.txt          a model man page
usage.txt        how to use ssc
releasenotes.txt chips
LICENCE.txt      ssc licence information
LICENSE.txt      formal GPL 3 licence
more licences    licences for borrowed external content



Background
I have a website, arts & ego, at https://dylanharris.org/. It has
approaching 60G of original content. It contains hand coded HTMLs 2,
3, 4 & 5. It is a complete mess. Despite a long search, I could not
find any tools to properly identify its flaws. Anything I did find
was at most cursory.

Then came the cow flu*.

*corvid means crow, thus covid means cow**.

**by the rules of sympathetic spelling.



Unabashed Opportunism
If you appreciate modernist poetry or abstract photography, I’ve been
published. Click on books at arts & ego for gen.



written by dylan harris
mail@ssc.lu
October 2024

usage

NAME
ssc - static site checker



SYNOPSIS
ssc [...] directory
ssc -f config
ssc



DESCRIPTION
ssc (the Static Site Checker) is an opinionated HTML nit-picker,
intended for people, like its author, who hand code websites. It
doesn't just check static sites for broken links, dubious syntax, and
bad semantic data, it will actively complain about things that are
perfectly legal but rather untidy, like its author.

Except when serving CGI queries, it recursively scans the directory
seeking HTML & related files to analyse. It produces a list of errors,
warnings, and other hints of imperfection.

Scripts are ignored.



COMMAND LINE ONLY SWITCHES

These options are only available on the command line:

-a                      Ask the user to enter arguments, answer them;
                        ask and answer again, until a blank line is
                        entered. Arguments entered on the command line
                        (with -a) will be processed as normal. Certain
                        arguments entered during the argument answer
                        cycle will be ignored, including -a and thread
                        counts. (likely to be withdrawn)
                        
-A                      List switches read, then exit.

-f file                 Load a configuration from a file, which should
                        be in .INI file format. See CONFIGURATION FILE
                        FORMAT below. This should be an absolute path.

-F                      Load the configuration file .ssc/config in the
                        current directory.

-h                      Show a summary of switches, then exit.

-H snippet              Only nitpick this snippet of HTML.

--ontology.list         List known schema versions, then exit.

-q                      Use a simple shell. The shell accepts the
                        following commands:
                            c   configure: enter a series of command
                                line switches, one per line, then a
                                full stop on a line by itself
                            C   clear the current configuration
                            h   a summary of available commands
                            p   print the current configuration
                            q   quit
                            r   run using the current configuration

-V                      Show version details, then exit.

--validation            List extendable attribute types, then exit.
                        These types accept additional values on some
                        X/HTML attributes and CSS properties. It is
                        intended to allow checking of HTML etc. with
                        bespoke extensions.



COMMAND LINE AND CONFIGURATION FILES SWITCHES

These options are available on the command line (with dashes) and in
configuration files (without dashes). The short form single letter
alternative switches only work on the command line.

Most binary options, e.g. those without arguments below that turn on a
feature (which may be the default), have a corresponding "no-" switch
to turn it off. The "no-" is inserted after the dot, so, for example,
the contradiction to "--general.noh" would be "--general.no-noh". When
both are specified, perhaps in a configuration file and on the command
line, the "no-" switch always applies.

Corpus
Corpus switches control XML data for output to a local search engine.

--corpus.article        Prefer the content of <ARTICLE> when gathering
                        corpus text.

--corpus.body           Prefer the content of <BODY> when gathering
                        corpus text. This is the default.

--corpus.main           Prefer the content of <MAIN> when gathering
                        corpus text.

--corpus.output file    Dump XML corpus of site into file. This is
                        intended for use by a local search engine. If
                        none of --corpus.article, --corpus.body, or
                        --corpus.main are specified, the content of
                        <BODY> is used. If more than one are specified,
                        then the text collected depends on a page's
                        content. This is incompatible with
                        --shadow.update.

CSS
The CSS switches precisely control CSS interpretation. If you are
checking a site with a CSS version contemporary to the given HTML
version (see --html.version), you can ignore them. Otherwise, you
probably only need --css.version. The other switches allow you to
precisely specify the CSS modules presumed. Specific modules are
defined at w3.org.

--css.adjust X          Use CSS Colour Adjustment level X, where X is 0
                        or 3.

--css.anchor X          Use CSS Scrollbar Anchoring level X, where X is
                        0 or 3.

--css.animation X       Use CSS Animation level X, where X is 0, 3 or
                        4.

--css.background X      Use CSS Backgrounds and Borders level X, where
                        X is 0 or 3.
                        
--css.box-align X       Use CSS Box Alignment level X, where X is 0 or
                        3.

--css.box-model X       Use CSS Box Model level X, where X is 0, 3 or
                        4.
                        
--css.box-sizing X      Use CSS Box Sizing level X, where X is 0, 3 or
                        4.

--css.cascade X         Use CSS Cascading and Inheritance level X,
                        where X is 0, 3, 4, 5 or 6.

--css.colour X          Use CSS Colour level X, where X is 0, 3, 4 or
                        5.

--css.compositing X     Use CSS Compositing and Blending level X, where
                        X is 0 or 3.

--css.cond-rule X       Use CSS Conditional Rules level X, where X is
                        0, 3, 4, or 5.

--css.contain X         Use CSS Contain level X, where X is 0, 3, 4 or
                        5: see --css.version for gen.

--css.content X         Use CSS Generated Content level X, where X is 0
                        or 3.

--css.cs X              Use CSS Counter Style level X, where X is 0 or
                        3.

--css.custom X          Use CSS Custom Properties for Cascading
                        Variables level X, where X is 0 or 3.

--css.device  X         Use CSS Device Adaption level X, where X is 0
                        or 3.

--css.display X         Use CSS Display level X, where X is 0 or 3.

--css.ease X            Use CSS Easing Functions level X, where X is 0
                        or 3.

--css.exclude X         Use CSS Exclusions level X, where X is 0 or 3.

--css.extension ext     Presume files with extension '.ext' are CSS
                        files.

--css.fbl X             Use CSS Flexible Box Layout level X, where X is
                        0 or 3.

--css.filter X          Use CSS Filter Effects level X, where X is 0 or
                        3.

--css.float X           Use CSS Page Floats level X, where X is 0 or 3.

--css.font X            Use CSS Fonts level X, where X is 0, 3, 4 or 5.

--css.frag X            Use CSS Fragmentation level X, where X is 0, 3.
                        or 4

--css.grid X            Use CSS Grid level X, where X is 0, 3 or 4: see
                        --css.version for gen.

--css.highlight X       Use CSS Custom Highlights level X, where X is 0
                        or 3.

--css.image X           Use CSS Images level X, where X is 0, 3 or 4.

--css.inline X          Use CSS Inline Layout level X, where X is 0 or
                        3.

--css.line-grid X       Use CSS Line Grid level X, where X is 0 or 3.

--css.list X            Use CSS Lists and Counters level X, where X is
                        0 or 3.

--css.logic X           Use CSS Logical Properties level X, where X is
                        0 or 3.

--css.marquee X         Use CSS Marquee level X, where X is 0 or 3.

--css.masking X         Use CSS Masking level X, where X is 0 or 3.

--css.media X           Use CSS Media Queries level X, where X is 0, 3,
                        4 or 5.

--css.mobile            Test against the CSS Mobile Profile.

--css.multi-column X    Use CSS Multi-Column level X, where X is 0 or
                        3.

--css.namespace X       Use CSS Namespaces level X, where X is 0 or 3.

--css.nes X             Use CSS Non-Element Selectors level X, where X
                        is 0 or 3.

--css.overflow X        Use CSS Overflow level X, where X is 0, 3 or 4.

--css.overscroll X      Use CSS Overscroll Behaviour level X, where X
                        is 0 or 3.

--css.page X            Use CSS Paged Media level X, where X is 0 or 3.

--css.position X        Use CSS Positions level X, where X is 0 or 3.

--css.present X         Use CSS Presentation Levels level X, where X is
                        0 or 3.

--css.print             Test against the CSS Print Profile.

--css.region X          Use CSS Regions level X, where X is 0 or 3.

--css.rhythm X          Use CSS Rhythmic Sizing level X, where X is 0
                        or 3.

--css.round X           Use CSS Round Display level X, where X is 0 or
                        3.

--css.ruby X            Use CSS Ruby Annotations level X, where X is 0
                        or 3.

--css.scope X           Use CSS Scoping level X, where X is 0 or 3.

--css.scrollbar X       Use CSS Scrollbar Style level X, where X is 0
                        or 3.

--css.sda X             Use CSS Scroll-Driven Animations Style level X,
                        where X is 0 or 3.

--css.selector X        Use CSS Selectors level X, where X is 0, 3 or
                        4.

--css.shadow X          Use CSS Shadow Parts level X, where X is 0 or
                        3.

--css.shape X           Use CSS Shapes level X, where X is 0, 3 or 4.

--css.snap X            Use CSS Scroll Snap level X, where X is 0 or 3.

--css.spatial X         Use CSS Spatial Navigation level X, where X is
                        0 or 3.

--css.speech X          Use CSS Speech level X, where X is 0 or 3.

--css.style X           Use CSS Style level X, where X is 0 or 3.

--css.syntax X          Use CSS Syntex level X, where X is 0 or 3

--css.table X           Use CSS Tables level X, where X is 0 or 3 (this
                        is an experimental spec, likely to change).

--css.text X            Use CSS Text level X, where X is 0, 3 or 4.

--css.text-dec X        Use CSS Text Decoration level X, where X is 0,
                        3 or 4.

--css.tv                Test against the CSS TV Profile.

--css.transform X       Use CSS Transforms level X, where X is 0, 3 or
                        4: see --css.version for gen.

--css.transition X      Use CSS Transitions level X, where X is 0 or 3.

--css.ui X              Use CSS Basic User Interface level X, where X
                        is 0, 3 or 4.

--css.value X           Use CSS Values and Units level X, where X is 0,
                        3 or 4.

--css.verify            Verify CSS files (replaces --general.css).

--css.version X         Presume version X of CSS, where X is one of:
                            1     CSS 1.0
                            2.0   CSS 2.0
                            2.1   CSS 2.1
                            2.2   CSS 2.2 (Feb 2022 draft)
                            3     all CSS level 3 so far
                            4     all CSS level 4 so far
                            5     all CSS level 5 so far
                            6     all CSS level 6 so far
                            2007
                            2010
                            2015
                            2015+
                            2015++
                            2017
                            2017+
                            2017++
                            2018
                            2018+
                            2018++
                            2020
                            2020+
                            2020++
                            2021
                            2021+
                            2021++
                            2022
                            2022+
                            2022++
                            2023
                            2023+
                            2023++
                            2024
                            2024+
                            2024++
                        The years are CSS snapshots, whether the year
                        itself for stable modules, with + for wobbly
                        modules, and ++ for wibbly-wobbly modules, as
                        per the corresponding W3 CSS snapshots (the
                        terminology in those snapshots is inconsistent,
                        hence our use of the scientific terms wobbly
                        and wibbly-wobbly).
                        For levels 3, 4, 5 and 6, note that extensions
                        that are part of neither CSS 1 nor CSS 2.x
                        specifications are numbered three and upwards
                        in ssc, for internal consistency. If you wish
                        to use an extension named ... level 1, that is
                        not part of CSS 1, specify 3. Similarly, for
                        those named level 2 that are not part of any
                        CSS 2 specification, etc..

--css.view  X           Use CSS View Transitions level X, where X is 0
                        or 3.

--css.wc X              Use CSS Will Change level X, where X is 0 or 3.

--css.writing X         Use CSS Writing Mode level X, where X is 0, 3
                        or 4.

General switches
This are switches that don't really belong in any other section.

--general.class         Nitpick class values.

--general.classic       Report all classes used, not just those in
                        CSS files.

--general.cgi           Check environment variables for snippets of
-W                      HTML. SSC expects environment variables as
                        produced by OpenBSD's native httpd, produced
                        using <FORM METHOD=GET ...>. Do NOT let ssc
                        anywhere near untrusted data. Ignores many
                        options such as shadowing.

--general.datapath dir  Look for any configuration, caches, and other
-C dir                  useful files, in this directory.

--general.defthrd N     If --general.thread is not given, then set the
-Y N                    number of threads to N. The default is 1. If 0
                        is specified, then select a number of threads
                        not entirely inappropriate for the hardware.

--general.exclude xxx   Ignore all paths containing xxx. May be
                        repeated. Case independent under Windows only.
                        .DS_Store is always excluded under darwin.

--general.file XXX      File for persistent data. See also
                        --general.datapath. Default extension: .ndx.

--general.info          Report launch context when starting.

--general.maxfilesize n Do not process HTML source files that exceed n
                        bytes in size (default: 4M). Specify 0 for
                        unlimited, although be warned that ssc is
                        stunningly stupid in such circumstances and may
                        even attempt to load files bigger than
                        available memory.

--general.output file   Output to the specified file. If this switch is
-o file                 not used, standard output is used.

--general.progress      Dump progress information to standard output.
-D                      This can interfere with formatted output.

--general.rdfa          Check RDFa attributes (version 1.1.3). This is
                        intended for ontology testing only, so is
                        incomplete.

--general.rpt           Report CSS files that are opened.

--general.spec          Reset the values of most switches to false.
-j

--general.test          Output data in automated test format. Used by
-T                      ssc-test. Not generally useful. Documented so
                        you can avoid using it!

--general.thread N      Use N threads when running. Defaults to 1. If
-y N                    0 is given, a value not entirely inappropriate
                        for the hardware is used. Too high a value can
                        cause problems. See also --general.defthrd.

--general.vcs           Excludes, as per --general.exclude, files and
                        directories called:
                            .bazaar
                            .bk
                            CVS
                            .cvsignore
                            _darcs
                            .fslckout
                            .git
                            .gitattributes
                            .gitignore
                            .gitmodules
                            .pijul
                            RCS
                            SCCS
                            .svn

HTML
The only HTML switch you are likely to need is --html.version, and then
only if you want to check a site that is not contemporary to the build
of ssc. The remaining switches allow you to precisely control analysis
of older sites.

--html.custom EL        Define a custom element <EL> for verifying the
                        IS attribute. May be repeated.

--html.force            If <!DOCTYPE...> is missing, force presumption
                        of --html.version value, not HTML 1/tags

--html.ie               Don't mention certain Internet Explorer
                        'features'.

--html.ignore EL        Ignore attributes and content of the element
                        <EL>. May be repeated.

--html.lang LA          If an X/HTML file does not have a language /
                        dialect specified (e.g. "en" for generic
                        English, "en-IE" for Irish English, "lb-LU"
                        for Luxembourgish, "ma" for Marain, etc.),
                        default to 'LA'. If not given, the default is
                        your system default, or, if none, then "en-US".

--html.rel              Only mention <LINK> REL values, found neither
                        in the living standard nor at microformats.org,
                        in debug output.

--html.rfc1867          Ignore the RFC 1867 (INPUT=FILE) extension when
                        processing HTML 2.0

--html.rfc1942          Ignore the RFC 1942 (tables) extension when
                        processing HTML 2.0.

--html.rfc1980          Ignore the RFC 1980 (client side image maps)
                        extension when processing HTML 2.0.

--html.rfc2070          Ignore the RFC 2070 (internationalisation)
                        extension when processing HTML 2.0.

--html.ruby             Accept Ruby Markup Extension (draft, late April
                        2024) for HTML from May 2024 onwards.

--html.safari           Don't mention certain early Safari 'features'.

--html.sloven           Ignore perfectly legal yet inefficient, indeed
                        thoroughly slovenly, HTML, such as being far
                        too lazy to bother to get round to closing
                        elements.

--html.ssi              Process Server Side Includes (SSIs). Note ssc
-I                      cannot process SSIs directives with formulae.
                        Processing SSIs may cause incorrect line
                        numbers to be mentioned when an issue is
                        reported.

--html.tags             When an HTML file is loaded that contains no
                        DOCTYPE, ssc normally presumes HTML 1. This
                        switch tells it to presume the file conforms
                        to an earlier HTML Tags specification (the one
                        at CERN). This is overridden by --html.version.

--html.title n          If <ITLE> text is longer than n characters,
-z n                    say so. This applies to text enclosed by a
                        <TITLE> element under <HEAD>, not the value of
                        TITLE attributes.

--html.version X        If no doctype (or xml header) is specified,
                        presume version X of HTML. X can be:
                            tags  HTML tags (1991, informal)
                            1     HTML 1.0 (Jun 1993 draft)
                            1.0   HTML 1.0 (Jun 1993 draft)
                            +     HTML Plus (Nov 1993 draft)
                            2     HTML 2.0
                            2.0   HTML 2.0
                            3     HTML 3.2
                            3.0   HTML 3.0 (Mar 1995 draft)
                            3.2   HTML 3.2
                            4     HTML 4.01
                            4.0   HTML 4.0
                            4.1   HTML 4.01
                            4.2   XHTML 1.0
                            4.3   XHTML 1.1 core
                            4.4   XHTML 2.0 (Dec 2010 draft)
                            5     recent WhatWG HTML 5
                            5.0   W3 HTML 5.0
                            5.1   W3 HTML 5.1
                            5.2   W3 HTML 5.2
                            5.3   W3 HTML 5.3 (Oct 2018 draft)

                            2005/1/1    WhatWG WebApps draft (Jan 2005)
                            ...         (halfly)
                            2007/1/1    WhatWG WebApps draft (Jan 2007)
                            2007/7/1    WhatWG HTML 5 (Jul 2007)
                            ...         (halfly)
                            2021/1/1    WhatWG HTML 5 (Jan 2021)
                            ...         (quarterly)
                            2024/4/1    WhatWG HTML 5 (Apr 2024)

                            XHTML 1.0   XHTML 1.0
                            XHTML 1.1   XHTML 1.1 core
                            XHTML 2.0   (Dec 2010 draft)
                            XHTML 5.x   XHTML corresponding to
                                        equivalent W3 HTML

                        Although you can specify exact dates for
                        versions of the WhatWG HTML 5 living standard,
                        currently only broad versions published in
                        January and July are supported (quarterly
                        from 2021).

                        Certain versions of HTML offer variants, such
                        as loose and strict definitions. ssc picks
                        those up from the <!DOCTYPE ...> in the HTML
                        file, if any, and then carefully ignores them.

                        Validation of XHTML is even less strict.

                        Just to remind you, there are no guarantees of
                        accuracy (or inaccuracy).

                        Copies of the appropriate standards can be
                        found online. A copy of the copies referenced
                        during ssc's development can be found at
                        https://ssc.lu/.

Link switches
If you want to check links on the site, you'll find these switches
useful, particularly --link.external. --link.check is a must, it spots
broken links within the site.

--link.301              Normally, when ssc checks external links
-3                      (--link.external), it does not report http
                        forwarding errors 301 and 308. Use this switch
                        to have it do so.

--link.check            Check internal links, e.g. those within the
-l                      website being analysed.

--link.example          Report links to faux domains, as defined by RFC
                        2606 (note ssc also reports links to
                        example.edu, example.gov & example.mil).

--link.external         Check external links, e.g. those not on the
-e                      site being checked. Note that ssc will NOT
                        check RFC 2606 links, such as example.com (see
                        --link.example).

--link.forward          Report HTTP forwarding errors encountered when
                        checking external links (e.g. 301 and 308)

--link.ignore DOMAIN    When checking external links, ignore this
                        domain. May be repeated.

--link.local            Report links to local domains, such as domains
                        ending in .lan, .home, .corp, and others.

--link.once             Only report each broken external link once. If,
-O                      for example, the site has a number of references
                        to a page that does not exist, ssc will only
                        report the first instance of the broken link.
                        Note that, even if it reports every occurrence
                        of the link, it will only check it the first
                        time it's encountered (requires
                        --link.external).

--link.pretend FILE     Pretend links containing xxx exist. May be
                        repeated.

--link.report DOMAIN    Report links to domain and its descendents. May
                        be repeated.

--link.revoke           Do not check whether links' https certificates
-r                      have been revoked (requires --link.external).

--link.xlink            Check crosslink IDs on the site being analysed.
-X                      For example, if a link goes to /index.html#id,
                        then, when this switch is set, ssc will verify
                        that the id exists and that it is not hidden.

MathML switches
These switches are useful when you have some MathML which is not
contemporary to the corresponding HTML.

--math.version N        Presume version N of MathML (1, 2, 3 pr 4). The
                        following versions are supported:
                                0       based on the HTML version
                                1       MathML 1
                                2       MathML 2
                                3       MathML 3
                                4.20    MathML 4 2020 draft
                                4       MathML 4 2022 draft
                                core    MathML 4 core (May 2022 draft)

Microformat switches
These switches are useful for checking andor outputting any microformat
data found.

--microformat.export    Export microformat data encountered in JSON
                        format. This option will write files in the
                        same directory as the source, with the
                        extension .json.

--microformat.verify    Verify Microformats data in class and rel
-M                      attributes (see https://microformats.org/).

--microformat.version x Presume microformats version x. The following
                        values are currently accepted:
                                1   microformats version 1 only
                                2   microformats version 2 only
                                3   both microformats versions 1 and 2

Nits
Nits are the output of ssc, the static site NITpicker. You will need
these switches if you want to hide certain nits, output lots of extra
gen, etc..

--nits.abhorrent n      redefine nit n as an abhorrence; may be
                        repeated (the value of n can be determined
                        using --nits.nids below).

--nits.catastrophe n    redefine nit n as a catastrophe; may be
                        repeated (the value of n can be determined
                        using --nits.nids below).

--nits.comment n        Redefine nit n as a comment; may be repeated
                        (the value of n can be determined using
                        --nits.nids).

--nits.debug n          Redefine nit n as a debug message; may be
                        repeated (the value of n can be determined
                        using --nits.nids).

--nits.error n          Redefine nit n as an error; may be repeated
                        (the value of n can be determined using
                        --nits.nids).

--nits.errorexit x      If nits of the specified category or worse are
-E                      generated, then, on exit, return an error code.
                        Values are: 'catastrophe', 'error' (the
                        default), 'warning', 'info', or 'comment'.

--nits.expand           Expand text content of certain nits.

--nits.extra            Report additional nits.

--nits.format F         Specify the output format; F is a template
                        file (see OUTPUT TEMPLATE below).

--nits.info n           Redefine nit n as information; may be repeated
                        (the value of n can be determined using
                        --nits.nids).

--nits.nids             Output nit ids, which can be used to redefine
                        nits.

--nits.override F       Use this output format, not the one specified
                        by --nits.format. F is a template file (see
                        OUTPUT TEMPLATE below). This switch is intended
                        to aid automation.

--nits.quote X          Specify quote style when using nit.format. X
                        can be 'text' or 'html'.

--nits.root             By default, seek nit output template files in
                        the website root.

--nits.silence n        Silence nit n; may be repeated (the value of n
                        can be determined using --nits.nids).

--nits.unique           Do not output repeated nits, even if they may
                        contain additional information.

--nits.verbose x        Output nits to the specified verbosity:
-v                      'catastrophe', 'abhorrent', 'error', 'warning',
                        'info' (the default), 'comment', or '0' for
                        silence. Additional values are available when
                        debugging. Each level includes its preceding
                        level, so, for example, 'warning' will also
                        output 'catastrophe', 'abhorrent', and 'error'
                        nits.

--nits.warning n        Redefine nit n as a warning; may be repeated
                        (the value of n can be determined using
                        --nits.nids).

--nits.watch            Output debug nits (intended for automation).

Ontology switches
If you are interested in checking andor hoovering ontology data, you
may find these switches useful. Note that ssc only knows about certain
ontologies (see --ontology.list).

--ontology.export       Export ontologies encountered. This data is
                        exported in JSON format (not JSON-LD).

--ontology.root DIR     When exporting ontologies with
                        --ontology.export, write files into the
                        directory DIR. ssc will create the directory
                        tree structure as appropriate.

--ontology.verify       Check ontology found in WhatWG living standard
                        microdata attributes (itemprop, itemtype,
                        etc.).

--ontology.virtual v=d  When exporting ontologies using
                        --ontology.export, export the contents of
                        virtual directory 'v' to 'd'. 'v' must match a
                        directory identified with --site.virtual. For
                        example:
                            --ontology.virtual virtual=X:\virtual.

--ontology.ONT X.Y      Presume version X.Y of ontology ONT. For
                        example:
                            --ontology.xsd 1.1
                        defaults usage of XSD to version 1.1. This
                        versioning applies to RDFa, microdata, and
                        microformats (using class) analysis. If .Y is
                        omitted, .0 is presumed. X must be present.
                        Unspecified defaults are derived from the HTML
                        version. For a list of possible values, use
                        --ontology.list.

                        At the time of writing, the following ontology
                        versions can be verified. Note that single
                        version ontologies cannot have their version
                        changed:
                                adms 1.0,2.0
                                article 12,14,18,22
                                as 1.0,2.0
                                basic 1.0-1.3,2.1,3.0 (see below)
                                bfo 2.0,2020 (see below)
                                bibo 1.3
                                biro 1.1
                                book 12,14,18,22
                                cc 1.0
                                cito 2.8
                                content 1.0
                                crs 1.0 (see below)
                                csvw 1.0
                                ctag 1.0
                                daq 1.0
                                ddi 1.0
                                dbp 1.0
                                dbp-owl 1.0
                                dbr 1.0
                                dc11 1.0,1.1
                                dcam 1.0
                                dcat 1.0,2.0
                                dcmi 1.0
                                dcterms 1.0,1.1
                                ddi 1.0
                                doap 1.0
                                dpv* 0.1-2.0 (see below)
                                dqv 1.0
                                describedby 1.0
                                duv 1.0
                                earl 1.0
                                event 1.0
                                exif 1.0-3.0 (see below)
                                exifex 2.21-3.0 (see below)
                                foaf 0.1-0.99
                                frbr_core 1.0
                                gr 1.0
                                grddl 1.0
                                gs1 1.1-1.5
                                ical 1.0
                                icaltzd 1.0
                                jsonld 1.0,1.1
                                ldp 1.0
                                license 1.0
                                locn 1.0
                                ma 1.0
                                mf 1.0-2.255
                                music 12,14,18,22
                                oa 1.0
                                odrl 1.0
                                og 10,12,14,18,22 (see below)
                                org 1.0
                                owl 1.0,2.0
                                pam 2.0 (see below)
                                pcm 3.1 (see below)
                                pcmm 3.0 (see below)
                                pcv 1.0(see below)
                                pdf 1.0 (see below)
                                photoshop 1.0 (see below)
                                pim 1.0-3.0 (see below)
                                pmi 3.0 (see below)
                                poetry 1.0,1.1
                                prism 1.0-3.0 (see below)
                                prism-ad 3.0 (see below)
                                prl 1.0-2.0 (see below)
                                prm 3.0 (see below)
                                prs 3.1 (see below)
                                profile 12,14,18,22
                                prov 1.0
                                psv 1.0 (see below)
                                ptr 1.0
                                pur 2.1-3.0 (see below)
                                qb 1.0
                                rdf 1.0-1.3
                                rdfa 1.0-1.3
                                rdfg 1.0
                                rdfs 1.0
                                rev 1.0
                                rif 1.0
                                role 1.0
                                rr 1.0
                                schema.org 0.10-28 (see below)
                                sd 1.0
                                sioc 1.0
                                sioc_s 1.0
                                sioc_t 1.0
                                skos 1.0
                                skosxl 1.0
                                sosa 1.0
                                ssn 1.0
                                stdim 1.0 (see below)
                                stevt 1.0 (see below)
                                stfnt 1.0 (see below)
                                stjob 1.0 (see below)
                                stref 1.0 (see below)
                                stver 1.0 (see below)
                                taxo 1.0
                                tiff 6.0
                                time 1.0
                                v 1.0
                                vann 1.0,1.1
                                vcard 1,2,3,4 (see below)
                                video 12,14,18,22
                                void 1.0
                                wdr 1.0
                                wdrs 1.0
                                website 12,14,18,22
                                wwg 1.0
                                xhv 1.0
                                xml 1.0
                                xmp 1.0 (see below)
                                xmpdm 1.0 (see below)
                                xmpg 1.0 (see below)
                                xmpgimg 1.0 (see below)
                                xmpidq 1.0 (see below)
                                xmpmm 1.0 (see below)
                                xmprights 1.0 (see below)
                                xmptpg 1.0 (see below)
                                xsd 1.0,1.1

                        The various Adobe ontologies (crs, pdf,
                        photoshop, stdim, stevt, stfnt, stjob, stref,
                        stver, smp, xmpdm, xmpg, xmpgimg, xmpidq,
                        xmpmm, xmprights, xmptpg) have only been
                        partially applied. They do not seem to have
                        been designed for microdata, hence the partial
                        implementation: the goal is to enable hoovering
                        to JSON.

                        BFO (Basic Format Ontology) versions should be
                        specified as follows:
                                Use         For
                                2.0         2.0
                                2.2         2020

                        BFO 2020 uses OBO's machine code style
                        identifiers. Given the history of computing
                        science, as a convenience for users, and with
                        my experience of both devops and maintaining
                        code, identifiers following the standard
                        ontology naming convention are also accepted.
                        Since this is unofficial, both standard English
                        and American dialect spellings are processed.

                        The data privacy family of ontologies follow
                        this versioning scheme:
                            Use     For
                            0.10    0.1
                            0.20    0.2
                            0.30    0.3
                            0.40    0.4.0
                            0.41    0.4.1
                            0.42    0.4.2
                            0.50    0.5
                            0.60    0.6
                            0.70    0.7
                            0.80    0.8.0
                            0.81    0.8.1
                            0.82    0.8.2
                            0.90    0.9
                            1.0     1
                            2.0     2
                        The data privacy ontology versions:
                            ai              2
                            dpv             0.1-2
                            eu-aiact        2
                            eu-dga          2
                            eu-gdpr         2
                            eu-nis2         2
                            eu-rights       2
                            gdpr            0.1-1
                            justifications  2
                            legal           0.5-1
                            legal-de        2
                            legal-eu        2
                            legal-gb        2
                            legal-ie        2
                            legal-in        2
                            legal-us        2
                            loc             2
                            nace            0.1-1
                            pd              0.4-2
                            rights-eu       0.8-2
                            risk            0.8-2
                            tech            0.8-2

                        The Exif & ExifEx ontologies have the following
                        versions:
                            Use     For
                            1.0     1.0 (exif only)
                            1.1     1.1 (exif only)
                            2.0     2.0 (exif only)
                            2.10    2.1 (exif only)
                            2.20    2.2 (exif only)
                            2.21    2.21
                            2.30    2.3
                            2.31    2.31
                            2.32    2.32
                            3.0     3.0
                        Manufacturers' extensions to EXIF are omitted,
                        with exceptions.

                        Open Graph versions correspond to snapshots of
                        the specs from 2010, 2012, 2014, 2018 & 2022.

                        The various Prism ontologies (pam, pamp, pcm,
                        pcmm, pcv, pim, pmi, prism, prism_ad, prl, prm,
                        prs, psv, pur) have only been partially
                        applied: some specifications are unavailable,
                        some specifications break HTML5 syntax. Prism
                        was not designed for microdata, hence the
                        partial implementation: the goal is to enable
                        hoovering to JSON.

                        Most versions of schema (schema.org) should be
                        specified by their version number, but this
                        doesn't work with early versions, which should
                        be specified a follows:
                                Use         For
                                0.10        June 2011
                                0.15        July 2011
                                0.20        August 2011
                                0.25        September 2011
                                0.30        October 2011
                                0.35        November 2011
                                0.40        December 2011
                                0.45        January 2012
                                0.50        February 2012
                                0.55        March 2012
                                0.60        April 2012
                                0.91-0.99   as version number
                                1.0         1.0a
                                1.1         1.0b
                                1.2         1.0c
                                1.3         1.0d
                                1.4         1.0e
                                1.5         1.0f
                                1.10        1.1
                                1.20        1.2
                                1.30        1.3
                                1.40        1.4
                                1.50        1.5
                                1.60        1.6
                                1.70        1.7
                                1.80        1.8
                                1.90        1.9
                                1.91        as version number
                                ...
                                28          as version number

                        vCard versions correspond to RDFa specs,
                        published in 2001, 2006, 2010 & 2014. They do
                        NOT correspond to vCard data format
                        specifications.

                        
Server switches
A simple web / web socket server is available to provide a GUI for
ssc, and to support a simple service. If this is used for more than
simple tasks, it should be put behind the usual array of good quality
services, such as a firewall, a proxy, and so on. It is not designed
to be robust. The following switches are available:

--server.enable         Enable the server (default disabled)

--server.accept F,T     Accept connections from clients in the address
                        range F to T. If T is omitted, it is F. The
                        default is 127.0.0.1. Non-local address ranges
                        are rejected. May be repeated.

--server.address A      Serve on this address (default 127.0.0.1). Use
                        * for all addresses.

--server.paramters f    The certificate parameters can be found in the
                        file f.

--server.passfile f     The certificate password can be found in the
                        file f.

--server.password xxx   The certificate password is xxx. This switch is
                        only available in configuration files, and does
                        not work on the command line.

--server.port P         Serve on this port (default 80, until I think
                        of a better one).

--server.private f      The certificate private key can be found in the
                        file f.

--server.public f       The certificate public key can be found in the
                        file f.


Shadow switches
A shadow is a copy of the site being analysed, with, for example, SSIs
resolved, bad content removed, and duplicated content consolidated.

--shadow.changed        When shadowing a site that has been previously
                        shadowed, only copy/link files that have
                        changed.

--shadow.comment        Do not delete comments when writing shadow
                        pages.

--shadow.copy X         Create a shadow directory structure from source
                        HTML files, with errors removed and some things
                        tidied up. X can be:
                                no     copy nothing (default)
                                pages  write 'fixed' source files,
                                       ignore non source files
                                hard   set up hard links to non-source
                                       files (requires source and
                                       shadow directories to be on the
                                       same disk) (see below)
                                soft   set up soft links to non-source
                                       files (see below)
                                all    copy non HTML files too
                                dedu   copy non HTML files, but
                                       deduplicate them, changing links
                                       in HTML source as necessary (see
                                       below)
                                report report duplicates (no
                                       shadowing)
                        ssc cannot convert between versions of HTML,
                        nor between HTML and XHTML.
                        Link options are only available on systems that
                        support filesystem links.

--shadow.enable         Enable shadowing (set by other shadow options).
                        If shadowing is enabled, but shadow.root is not
                        set, SSC will litter the site source
                        directories with .ndx files.

--shadow.file f         Write ssc's shadow cache to file f, to
                        accelerate future shadowing of the same
                        content, updated.

--shadow.ignore ext     When shadowing, ignore files with this
                        extension (may be repeated).

--shadow.info           Add a comment at or near the top of each
                        shadowed HTML file noting its generation time.

--shadow.msg text       Insert a comment containing text at the top of
                        each generated page. Note that, if any SSI
                        include file is updated, the comment will
                        appear whether or not the original page has
                        changed.

--shadow.root dir       Where to write the shadow site.

--shadow.space          Leave excess/repeated spaces and blank lines in
                        the shadowed files untidily untouched.

--shadow.ssi            Do NOT resolve Server Side Includes when
                        shadowing, even if --general.ssi is set.

--shadow.update         Only examine files that have changed since the
-u                      last time ssc ran. This is incompatible with
                        --corpus.file. This requires --shadow.file.
                        Nits of files that have not changed will not be
                        reported again.

--shadow.virtual v=d    When shadowing virtual directories, output the
                        shadow of virtual directory 'v' to directory
                        'd'. 'v' must match a directory set up using
                        --site.virtual.

Site switches
You will probably need to set some of these switches. For example, if
your website is www.example.com, then you should say so using the
--site.domain switch.

--site.domain domain    The domain name of the site is 'domain'. This
-S domain               can be repeated. This is used to identify any
                        URL that is apparently external but is actually
                        internal to the site.

--site.extension ext    Treat files with this extension as X/HTML
-x ext                  source files. This may be repeated. Files with
                        extension .html are always checked.

--site.index file       This is the name of the default file in a
-i file                 directory. This can be repeated. This is used
                        when checking internal links. The default
                        default is index.html.

--site.root dir         This is the root of the website to analyse. ssc
-g dir                  will recursively scan the directory analysing
                        any HTML files it finds. The default is the
                        current directory.

--site.virtual v=d      The virtual directory 'v' is located in actual
-L v=d                  directory 'd' on the local filesystem. For
                        example:
                            --site.virtual virtual=D:\actual

Spell switches
These control spell checking. SSC doesn't actually spell check itself,
it uses spell checking facilities on the host system, so your results
may vary.

--spell.accept XXX      XXX is a correct spelling of a word (or a list
                        of words) in all languages.

--spell.cased           Nitpick correctly spelt but wrongly cased
                        words.

--spell.check           Check text spelling. Uses external spelling
                        checkers, so results will be inconsistent
                        between systems.

--spell.dict LANG,DICT  Unix only. Associate dictionary DICT with LANG.
                        For example, if the standard English dictionary
                        is en_GB-large:
                            --spell.dict en-GB,en_GB-large
                        (Under Windows, ssc uses the OS dictionaries.)

--spell.icu             If "no", do not use the ICU libraries at all
                        (they are rather slow). This will increase the
                        inaccuracy and incorrectness of the spell
                        checks.

--spell.list FN,LANG    The file FN contains a list of valid spellings
                        for language LANG (which may include country
                        info). If LANG is omitted, the valid spellings
                        apply to all languages. For example:
                            --spell.list villages.txt,en-IE
                            --spell.list dorfer.txt,de
                            --spell.list letzstied.txt

--spell.path PATH       Unix only. Path to spelling executable.
                        Hunspell or a compatible program is expected.
                        If none is specified, ssc will seek hunspell.
                        Under Windows, ssc uses the system spell-
                        checker, if there is one.

Stats switches
SSC can output lots of statistical information about the site being
analysed, although by default it outputs nothing. Use --stats.selected
to output a small subset of statistical data, and --stats.all to output
everything. Use --stats.summary to output grand totals, and
--stats.page to output information on each page read. The other
switches allow you to precisely specify what data you want to see. If
you want to output the data to a file, use --stats.export. If you
select both --stats.page and --stats.all, be prepared for rather a lot
of output.

--stats.abbr            Output abbreviation report, so you can verify
                        the same abbreviations have the same expansions
                        across the site.

--stats.all             Output all statistics reports.

--stats.annotation      Output annotation report.

--stats.attribute       Output element attribute report, which expands
                        the element report to output information about
                        attributes used.

--stats.category        Output category report, which output the total
                        quantity of nits reported by nit category.

--stats.character-variant Output character variant report.

--stats.class           Output class report, which allows to you see
                        which classes are defined in CSS but not used,
                        which classes are used but not defined, as well
                        as a count of both for all classes encountered.

--stats.content-name    Output content name report.

--stats.counter-style   Output counter style report.

--stats.css-property    Output css property report, which gives you an
                        idea of the sophistication of the CSS used on
                        the site.

--stats.custom-media    Output custom media report, which lists all
                        named custom media definitions encountered.

--stats.custom-property Output custom property report, which lists all
                        named custom property definitions encountered.

--stats.definition      Output definitions report, so you can verify
                        the same terms have the same definitions across
                        the site.

--stats.element         Output element report., which totals all
                        elements encountered across the site.

--stats.error           Output counts of errors, warnings, etc..

--stats.export F        Export to file F.

--stats.file            Output file report, which reports the number of
                        pages processed, and summerises file sizes.

--stats.font            Output font report, which lists all fonts used
                        across the site.

--stats.font-family     Output font family report, which lists all font
                        families named across the site.

--stats.highlight       Output highlight report.

--stats.historical-form Output historical font form report.

--stats.id              Output id report, allowing you to identify
                        which ids are styled but not mentioned.

--stats.itemid          Output itemid report, which gives you an idea
                        of the ontological significance and depth of
                        the site.

--stats.keyframe        Output keyframe report, which lists all named
                        keyframes.

--stats.layer           Output layer report, which lists all named
                        layers.

--stats.meta            Produce statistics on <META> usage in <HEAD>.
                        Note that pragmas reported (http-equiv) are
                        those found in the HTML source, not those
                        returned by the HTTP protocol. Remember that
                        many web servers (not all) will remove some
                        pragmas when serving pages.

--stats.name-value      Output name/value pairs report, which helps
                        you identify inconsistencies between
                        definitions across the site.

--stats.ontology        Output ontology report, which gives an insight
                        into the ontological depth of the site being
                        analysed.

--stats.ornament        Output ornament report, which reports all named
                        CSS font ornaments encountered.

--stats.page            Produce statistics for each source file
                        encountered.

--stats.page-name       Output page name report, which reports all
                        named CSS page-names encountered.

--stats.palette         Output palette report, which reports all named
                        CSS palettes encountered.

--stats.property        Output ontology property count report, as an
                        addendum to --stats.ontology.

--stats.reference       Output reference report, which identifies, as
                        precisely as it can, which versions of HTML,
                        XHTML, CSS, etc., are found.

--stats.region          Output region report, which reports all CSS
                        named regions encountered.

--stats.scroll-anim     Output scroll animation report, which reports
                        all CSS named scroll animations encountered.

--stats.selected        Output a selected set of reports; may be
                        modified by other stats switches.

--stats.statement       Output CSS statement report, which summarises
                        all CSS statements encountered.

--stats.styleset        Output styleset report, which reports all CSS
                        named stylesets encountered.

--stats.stylistic       Output stylistic report, which reports all CSS
                        named stylistics encountered, excluding the
                        band themselves.

--stats.summary         Produce a summary of overall statistics for the
                        website, including grand totals.

--stats.swash           Output swash report, which reports all CSS
                        named swashes encountered.

--stats.version         Output version report, which summarises
                        versions of HTML, SVG, MathML, etc.,
                        encountered.

--stats.view            Output view report, which reports all CSS named
                        views encountered.

SVG switch
If you want to analyse some SVG that is not comporary to the HTML being
analysed, you may find the --svg.version switch useful.

--svg.version x         Presume any SVG code encountered is this
                        version, unless the SVG code itself specifies a
                        version. Versions recognised:
                            1.0
                            1.1
                            1.2      (really 1.2/tiny)
                            1.2/tiny
                            1.2/full (May 2004 draft, incomplete, any
                                      conflict with 1.2/tiny always
                                      resolves in favour of 1.2/tiny)
                            2.0
                            2.1 (April 2021 draft)
                        If this switch is not used, and some SVG code
                        does not identify its version, the version is
                        derived from the version of the host X/HTML
                        code.

Validation switches
These switches are only useful if you have bespoke HTML and CSS on your
website. They allow you to define additional valid values of certain
data types. Start with the --validation switch, and go on from there.

--validation            Only available from the command line. Lists all
                        types that can be given additional valid
                        values.

--validation.attribute ATT
                        Add the custom attribute ATT. This attribute
                        will be ignored, not validated. ATT may
                        optionally be a series of comma separated
                        values: name,namespace,flags,flags2
                        The possible values of flags and flags2 can be
                        understood by looking at the source.

--validation.charset CH Accept CH as a charset. May be repeated.

--validation.class CL   Add the valid class CL. May be repeated.

--validation.color COL  Accept COL as a colour. May be repeated.

--validation.colour COL Accept COL as a colour. May be repeated.

--validation.country CC Accept CC as a valid two-letter country code.
                        May be repeated.

--validation.currency CUR
                        Accept CUR as a valid currency. May be
                        repeated.

--validation.element EL Accept <EL> as a valid element. This element
                        will be ignored, not validated. EL may
                        optionally be a series of comma separated
                        values: name,namespace,flags,flags2
                        The possible values of flags and flags2 can be
                        understood by looking at the source. May be
                        repeated.

--validation.element-attribute EL,ATT
                        Accept the known attribute ATT on the element
                        <EL>. Doesn't work with namespaces (names
                        containing ':'). May be repeated.

--validation.extension EXT
                        Accept the extension EXT as a mimetype file
                        extension. May be repeated.

--validation.ff FEATURE Accept FEATURE as a CSS font feature. These
                        should normally be four characters long. May be
                        repeated.

--validation.ff VARIATION
                        Accept VARIATION as a CSS font variation. These
                        should normally be four characters long. May be
                        repeated.

--validation.httpequiv HEQ
                        Accept HEQ as a valid macro for httpequiv on
                        <META> elements. May be repeated.

--validation.lang LANG  Accept LANG as a valid language code. May be
                        repeated.

--validation.minor x    When validating W3 HTML 5 source code, using
-m x                    this minor version of W3 HTML 5. Valid values
                        are 0, 1, 2, and 3 (draft). WhatWG versions are
                        determined by date, corresponding roughly to
                        the date of the (online) publication of the
                        specific version. See the --html.version
                        switch.

--validation.metaname M
                        Accept M as valid for the NAME attribute of the
                        <META> element. The VALUE will be ignored. May
                        be repeated.

--validation.microdata  Validate (schema.org) microdata.

--validation.mimetype MT
                        Accept MT as a valid mimetype. May be repeated.

--validation.sgml SGML  Accept SGML as a valid SGML schema
                        identification (as found in <!DOCTYPE ...>). May
                        be repeated.

--validation.XXX YYY    Accept YYY as a valid value for attribute type
                        XXX. For a list of possible values of XXX, use
                        the command line switch --validation.



CONFIGURATION FILE FORMAT

If a configuration file is used, it should be in INI file format. All
content is optional.

Section and option names are derived from the long form switch name,
which consists of --SECTION.OPTION, laid out in the format:

[SECTION]
OPTION=
OPTION=123456

Switches that do not have a long form version cannot be used in a
configuration file.

Each ssc test (in the recipe/toast folder) has a configuration file;
browse them for examples.



ENVIRONMENT

If you set --general.cgi, ssc will check these environment variables:

QUERY_STRING            Run under OpenBSD's httpd server. See notes
                        below.
SSC_CONFIG              If no configuration file is given on the
                        command line, use this one
SSC_ARGS                Preliminary command line parameters

If, when SSC is run, the environment variable QUERY_STRING is set to an
OpenBSD httpd server CGI value that includes the parameter
html.snippet, then SSC will nitpick that snippet only. Some other
parameters are processed, including general.verbose and html.version.



EXIT STATUS

If no significant nits are found, ssc exits with 0, otherwise it exits
with a value > 0. See the --general.error switch.



OUTPUT TEMPLATE

The --nit.format switch allows control of output format. It takes a
file name. The format of that text file is a sequence of fixed section
names, enclosed in square brackets on their own lines, each optionally
followed by text. In that text, certain specific identifiers, enclosed
in brace pairs, are substituted. For example:

[dog-section]
My dog {{dog-name}} is a {{bad-dog}}.

For examples, browse recipe/toast/output/*.nit

If no file is specified, or if the file cannot be loaded, a default
template is used.

Note also the --nit.quote switch.



EXAMPLES

To verify the version of ssc:
ssc -V

To check the static web side source directory /home/site/wwwroot:
ssc /home/site/wwwroot

To check a static HTML/XHTML website for example.com, that uses server
side includes, in the current directory, with verification of external
links, with rather verbose output:
ssc -e -I -x html -x shtml -s example.com -v 5 -i index.shtml

To check a static web side in the current directory, with a virtual
directory, verifying microformats:
ssc -L vitual=/home/site/virtual -M

To check a static web site using a configuration file:
ssc -f config.file

A simple configuration file might contain:

[general]
verbose=4
output=simple.out
[site]
domain=example.edu
extension=html
index=index.html
root=simple

A configuration file to check a site against HTML 5.2 and SVG 1.1 might
contain:

[general]
output=site.out
class=
[link]
check=
[site]
domain=example.edu
extension=html
index=index.html
root=site
[html]
version=5.2
[svg]
version=1.1

A configuration file to check against a particular WhatWG living
standard, gathering statistics:

[general]
output=jan21.out
[html]
version=2021/01/01
[link]
check=
[microdata]
version=11.0
[site]
domain=example.edu
extension=html
index=index.html
root=site
[stats]
summary=
meta=

A configuration file to shadow copy and deduplicate a site might
contain:

[general]
output=dedu.out
class=
[site]
domain=example.edu
extension=html
index=index.html
root=site
[shadow]
copy=5
root=shadow
file=dedu.ndx

A configuration file to export microdata preparing against schema.org
version 7.2 might contain:

[general]
output=export.out
class=
[site]
domain=example.edu
extension=html
index=index.html
root=site
[link]
check=
[microdata]
export=
root=export
version=7.2

Example conf files can be found scatted across the test suite, in
particular in recipe/toast/conf/other and recipe/toast/conf/sites.



PREPARING and UPDATING a SITE

These files are based on the steps I take to update an OpenBSD website.

Presume a directory containing the following:
site.conf    ssc configuration file for a website
site         shadow output produced by ssc

Then I run a script like this:

ssc -f site.conf
upload.sh site /var/www/site-upload server user 0
ssh user@server "cd /var/www ; mv site x ; mv site-upload site ;
mv x site-upload ; ln -sf site htdocs"

upload.sh is a macos bash script that can be found among the source
code. Note that I have rather naughtily replaced OpenBSD's httpd
document directory /var/www/htdocs with a link.

The conf file can be found at recipe/toast/conf/sites/live.conf.



SEE ALSO

tidy
linkchecker



HISTORY

ssc (ssc.lu) is written by Dylan Harris (dylanharris.org)

known issues

SSC is α software. It doesn’t do what it’s supposed to do, and what it’s supposed to do is wrong.

Note that github hosts a list of known issues.

* How can such a dangerous animal have such a cuddly name? It’s like calling a Hound of Hell ‘Fluffy’, or Death’s horse Binky.


bug reporting

SSC is α software. It may contain unexpected features. If you encounter such a delight, please help improve ssc by collecting the following information (where relevant):

and emailing everything to mail@ssc.lu (if the collected files are more than small, please use a public fileserver and email the link). Do NOT send anything confidential. Furthermore, unless you request otherwise, we reserve the right to publish some or all of the information sent in future versions of ssc, usually in the test suite. If you have a fix, you are invited to submit a pull request on github. Thank you.


build

BUILD NOTES
static site checker
https://ssc.lu/
(c) 2020-2024 Dylan Harris


Introduction
============
SSC can be built on various unii with CMake and clang or gcc for C++ 17
or better, or Visual Studios 2017 / 2019 / 2022 under Windows. I have
built & tested it in various OSs on some amd64 & arm64 architectures.

Although ssc builds with older compilers on some older systems, not all
features are available.


Libraries
=========

Common dependencies
-------------------
ssc needs boost version 1.75 or better (https://boost.org),
a recent copy of the ICU libraries (https://icu-project.org/) (or
define NOICU). Microsoft's GSL (https://github.com/Microsoft/GSL)
(or define NO_GSL), and a recent version of libcurl (https://curl.se/)*
(or define NOCURL). If you want to experiment with the still-in-
development GUI version, you'll also need a recent version of wX.
Usually, an Operating System's package manager has appropriate versions
ready to install.

You may need to set these environment variables:
- BOOST: if you're not using your operating system's packaged flavour
  of boost, then set BOOST to your boost source root directory;
- CURL: if you're not using your operating system's packaged flavour
  of curl, then set CURL to your curl source root directory;
- GSL: set it to your GSL root directory.
- ICU_ROOT: if you're not using your operating system's packaged ICU,
  set ICU_ROOT to your ICU source root directory;
- WX_ROOT: if you're building the gui front end to ssc, you'll need to
  install wxWidgets and set WX_ROOT to its installation directory.

*libcurl requires a thread-safe underlying SSL library: see
https://curl.se/libcurl/c/threadsafe.html. 

Note that the Windows solutions no longer require vcpkg; it proved too
unreliable. However, if you wish to use it, go ahead: you may have
better luck than me.

hunspell
--------
Building SSC under unii, including macos, requires a development
installation of hunspell (https://hunspell.github.io/).

winspell
--------
The Windows build, by default, uses the native Windows spellchecker,
although, preceding Windows 11, that doesn't seem to work so well in
contexts unimpaired by monolingualism.


Notes on the GUI
================

wxWidgets
---------
Why use this ancient behemoth given the good number of somewhat less
archaic C++ GUI libraries? The requirements were: (i) Open Source;
(ii) supports Windows/MacOS/Linux/OpenBSD. Of those libraries I found,
only wxWidgets was documented to support OpenBSD. 

Polylingualism
--------------
ssc is written for coders. HTML/etc. code is based on English, so
ssc's GUI text is similarly monolingual.

Unstable
--------
The GUI will evolve rapidly over the coming few months, so expect it to
change significantly.


Building
========

Windows
-------
To build from Visual Studio, navigate to recipe/tea, open the
appropriate .sln file, then build. Only Visual Studios 2017, 2019 and
2022 have been built & tested, for amd64 (x64) and arm64 (M2), under
Windows 10 & 11.

On low memory machines, disable the /MP switch.

The Visual Studio solutions use vcpkg, which resolves dependencies.
For all versions, except recent editions of Visual Studio 2022, you may
need to first download and install vcpkg yourself, from
https://vcpkg.io/.


Unii & mock Unii
----------------
You will need CMake 3.19 or better. On Linux, you will also need
lsb-release. These can be found in most distributions' standard
packages. For macos, I used macports, but I expect brew is good too.
From the home ssc directory, compile a normal build thus:
cmake .
make
ctest
make install

For a debug build:
cmake -DCMAKE_BUILD_TYPE=Debug .
make
ctest
make install

If everything works correctly, then everything will be built, a series
of tests run, with a final result at the very end saying no failures.
Having said that, given SSC is alpha, don't be too surprised to see
some warnings or some final test errors. Note in particular that
complaints about being unable to find or copy files during testing are
not of concern, these come from scripts that set up or tear down
individual tests, and the standard commands used sometimes complain if
they can't find files they're supposed to delete, which is a bit silly
given that means things are already in the required state.

ssc has been successfully built in OpenBSD, FreeBSD, Linux & MacOS on
AMD64, and in recent versions of Linux & MacOS under ARM64. 

The current version of ssc requires the current version of an operating
system. Older operating systems require older versions of ssc. Not all
features work on all systems.

I've sometimes found it necessary to use cmake's
-DCMAKE_CXX_COMPILER=... switch.

Centos 9
--------
The appropriate CMake command is:
  cmake . -DFLAVOUR=CentosOSStream -DFLAVOUR_VER=9
(note the standard English spelling of flavour.)

OpenBSD
-------
You may need to increase significantly the available memory setting
for your build account in login.conf.

Macos
-----
Certain versions of macos clang produce buggy code, whether or not
optimisations are applied. Use an alternative compiler if you want
a stable executable. I accept a bug could be in ssc code, but I've not
found it.


Testing
=======

Windows
-------
Under Visual Studio, run ssc??-test using these arguments:
  -v -x $(ProjectDir)..\..\ssc.exe
    -f $(ProjectDir)..\toast\ssc-test\win.lst
(on one line)

Add '-d' if you want the test utility to retain temporary files.

CMake
-----
Under CMake, run ctest:
  ctest -V
(which runs ssc-test for you, using nix.lst).

Dimitude
--------
The testing utility is rather dim; it will test unbuilt features,
causing failures.

Spelling test results depend on the dictionaries installed.


Supporting libraries
====================

GSL
---
If you can't find a copy of Microsoft's GSL in your system's standard
package suite, then grab a current copy from its github repository
(https://github.com/Microsoft/GSL), then unpack, build and install it.
In Windows, remember to add its root directory to your local path.

Boost
-----
Boost is to C++ as breakfast to the working day.

Most package managers support it, including vcpkg. Alternatively, build
your own version using the source found at boost.org.

Curl
----
curl is used for link checking and, where necessary, obtaining remote
resources. Most package managers support it.

wxWidgets
---------
This is only required if you make a GUI build, which is not recommended
(yet). It can be found at wxwidgets.org. Note that I have not tested it
under many supported (by wxWidgets) systems.


Editions
========

Currently, there are new, work-in-progress, server and gui editions in
the Visual Studio solution. These do not work and should not be used.
Stick to the standard edition.

notes

If everything works correctly, then everything will be built, a series of tests run, with a final result at the very end saying no failures. Having said that, given SSC is α, don’t be too surprised to see some warnings or some final test errors. Worse, some tests have dependencies that vary across systems, which can cause spurious test failures.


source

0.2.4

0.2.3

0.2.2

0.2.1

0.2.0

0.1.60

0.1.59

0.1.58

0.1.57

0.1.56

0.1.55

0.1.54

0.1.53

0.1.52

0.1.51

0.1.50

0.1.49

0.1.48

0.1.47

0.1.46

0.1.45

0.1.44

0.1.43

0.1.42

0.1.41

0.1.40

0.1.39

0.1.38

0.1.37

0.1.36

0.1.35

0.1.34

0.1.33

0.1.32

0.1.31

0.1.30

0.1.29

0.1.28

0.1.27

0.1.26

0.1.25

0.1.24

0.1.23

0.1.22

0.1.21

0.1.19

0.1.18

0.1.17

0.1.16

0.1.15

0.1.14

0.1.13

0.1.12

0.1.11

0.1.10

0.1.9

0.1.8

0.1.7

0.1.6

0.1.5

0.1.4

0.1.3

0.1.2

0.1.1

0.1.0

0.0.134

0.0.133

0.0.132

0.0.131

0.0.130

0.0.129

0.0.128

0.0.127

0.0.126

0.0.125

0.0.124

0.0.123

0.0.122

0.0.121

0.0.120

0.0.119

0.0.118

0.0.117

0.0.116

0.0.115

0.0.114

0.0.113

0.0.112

0.0.111

0.0.110

0.0.109

0.0.108

0.0.107

0.0.106

0.0.105

0.0.104

0.0.103

0.0.102

0.0.101

0.0.100

0.0.99

0.0.98

0.0.97

0.0.96

0.0.95

0.0.94

0.0.93

0.0.92

0.0.91

0.0.90

0.0.89

0.0.88

0.0.87

0.0.86

0.0.85

0.0.84

0.0.83

0.0.82

0.0.81

0.0.80

0.0.79

0.0.78

0.0.77

0.0.76

0.0.75

0.0.74

0.0.73

0.0.71

0.0.70

0.0.60

0.0.55

0.0.2


boot notes

Notes on folder names:

These reference documents are hoovered from various open source sites. They’re collected here for convenience; at all times the originals are correct. The subjects are: aria, activity streams, bibo, creative commons, charsets, content (RDF), content security policy, cascading style sheets, csvw, common tag, dataset quality, dbpedia, dublin core, data catalogue, did, document object model, domain, data quality, data usage, earl, ebu, fibo, foaf, good relations, grddl, HTML 1, HTML 2, HTML 3, HTML 4, HTTP, ical, its, javascript, json, lang, link relations, locn, ma-ont, marinetlo, mathML, media capture, microdata, mime, music, ns, web annotation, odrl, open graph, ontologies, openmath, org, other, owl, p3p, powder, prov, pso, qb, rddl, RDFa, RDFa, rif, schema.org, sd, sioc, skos, sm, smil, smpte, sosa, ssn, svg, time, ttml, url, vann, vcard, void, W3, webgl, webmention, whatwg, XHTML, xhv, XML, xsd, xsl, XSLT.


copyright & licence

Any dispute shall be resolved in accordance with the law of the Grand Duchy of Luxembourg.


SSC

SSC, static site checker, https://ssc.lu/
copyright (c) 2020-2024 dylan harris

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public
License along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA


W3

Some test files come from w3.org (some directly, in W3 documents, etc.), and are licensed as follows:

License

By obtaining and/or copying this work, you (the licensee) agree that you have read, understood, and will comply with the following terms and conditions.

Permission to copy, modify, and distribute this work, with or without modification, for any purpose and without fee or royalty is hereby granted, provided that you include the following on ALL copies of the work or portions thereof, including modifications:

    The full text of this NOTICE in a location viewable to users of the redistributed or derivative work.
    Any pre-existing intellectual property disclaimers, notices, or terms and conditions. If none exist, the W3C Software and Document Short Notice should be included.
    Notice of any changes or modifications, through a copyright statement on the new code or document such as
    "This software or document includes material copied from or derived from [title and URI of the W3C document]. Copyright © [YEAR] W3CÆ (MIT, ERCIM, Keio, Beihang)."

Disclaimers

THIS WORK IS PROVIDED "AS IS," AND COPYRIGHT HOLDERS MAKE NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO, WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF THE SOFTWARE OR DOCUMENT WILL NOT INFRINGE ANY THIRD PARTY PATENTS, COPYRIGHTS, TRADEMARKS OR OTHER RIGHTS.

COPYRIGHT HOLDERS WILL NOT BE LIABLE FOR ANY DIRECT, INDIRECT, SPECIAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF ANY USE OF THE SOFTWARE OR DOCUMENT.

The name and trademarks of copyright holders may NOT be used in advertising or publicity pertaining to the work without specific, written prior permission. Title to copyright in this work will at all times remain with copyright holders.
Notes

This version: http://www.w3.org/Consortium/Legal/2015/copyright-software-and-document

Previous version: http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231

This version makes clear that the license is applicable to both software and text, by changing the name and substituting "work" for instances of "software and its documentation." It moves "notice of changes or modifications to the files" to the copyright notice, to make clear that the license is compatible with other liberal licenses.


WhatWG

Some test files come from whatwg.org (some directly, in WhatWG documents, etc.), and are licensed under a Creative Commons Attribution 4.0 International License. See https://whatwg.org/ for details.


corruptpress.com

Some test files are derived from pages at corruptpress.com. They are licensed under a Creative Commons Attribution 4.0 International License. Browse https://corruptpress.com/ for details.


dylanharris.org

Some test files are derived from pages at https://dylanharris.org/. They are licensed under a Creative Commons Attribution 4.0 International License. Browse https://dylanharris.org/ for details.