shadow arguments

A shadow is a copy of the site being analysed, with, for example, SSIs resolved, bad content removed, and duplicated content consolidated.

shadow.changed When shadowing a site that has been previously shadowed, only copy/link files that have changed.
shadow.comment Do not delete comments when writing shadow pages.
shadow.copy X Create a shadow directory structure from source HTML files, with errors removed and some things tidied up. X can be:
no copy nothing (default)
pages write ‘fixed’ source files, ignore non source files
hard set up hard links to non-source files (requires source and shadow directories to be on the same disk) (see below)
soft set up soft links to non-source files (see below)
all copy non HTML files too
dedu copy non HTML files, but deduplicate them, changing links in HTML source as necessary (see below)
report report duplicates (no shadowing)
ssc cannot convert between versions of HTML, nor between HTML and XHTML.
Link options are only available on systems that support filesystem links.
shadow.enable Enable shadowing (set by other shadow options). If shadowing is enabled, but shadow.root is not set, SSC will litter the site source directories with .ndx files.
shadow.file f Write ssc’s shadow cache to file f, to accelerate future shadowing of the same content, updated.
shadow.ignore ext When shadowing, ignore files with this extension (may be repeated).
shadow.info Add a comment at or near the top of each shadowed HTML file noting its generation time.
shadow.msg text Insert a comment containing text at the top of each generated page. Note that, if any SSI include file is updated, the comment will appear whether or not the original page has changed.
shadow.naughty value Add value to censorship red list; censorship is discussed below.
shadow.nice value Add value to censorship green list; censorship is discussed below.
shadow.note value Add value to censorship blue list; censorship is discussed below.
shadow.root dir Where to write the shadow site.
shadow.space Leave excess/repeated spaces and blank lines in the shadowed files untidily untouched.
shadow.ssi Do NOT resolve Server Side Includes when shadowing, even if general.ssi is set.
shadow.update Only examine files that have changed since the -u last time ssc ran. This is incompatible with corpus.file. This requires shadow.file. Nits of files that have not changed will not be reported again.
shadow.virtual v=d When shadowing virtual directories, output the shadow of virtual directory ‘v’ to directory ‘d’. ‘v’ must match a directory set up using site.virtual.

CENSORSHIP

Certain pieces of data can be censored when shadowing, using the shadow.naughty, shadow.nice, and shadow.note switches (respectively defining a red list, a green list, and a blue list).

Each switch takes a command parameter, which is formatted as follows:

ATTRIBUTE,element,attribute,value
considers attribute of element with the given value

ELEMENT,element,value
considers the text contained within <ELEMENT> and </ELEMENT> of the given value

MICROFORMAT,type,property,value
considers a microformat type property of the given value

ONTOLOGY,ontology,type,property,value
considers a ontological type property of the given value

COMMON,value
consider that value anywhere

You could, for example, use --shadow.nice repeatedly to define half a dozen values for an element in a green list. This would mean all values apart from those specified would be suppressed (similar to a white list).

You could, for example, use --shadow.naughty repeatedly to define half a dozen values for an element in a red list. This would mean any occurrence of any of those value would be suppressed (similar to a black list).

When a value is suppressed, and you used --shadow.note to create an appropriate blue list entry, then that blue list entry is sought. If the blue list contains a value for the element, that value is inserted in place of the old value, otherwise the original is blanked.

For example:
--shadow.nice ELEMENT,title,Nice Title
--shadow.nice ELEMENT,title,Title
--shadow.note ELEMENT,title,[REDACTED]

If a page’s <TITLE></TITLE> contains any text apart from Title or Nice Title, it is replaced by [REDACTED].

Wildcards can be used for naughty and nice, using Perl syntax.

Dylan Harris
August 2025