configuration files

the basics

riff on name

To check a static web site using a configuration file: ssc -f config.file

A configuration file is in INI file format, which is basically a sequence of entries looking like:

[section]
setting=value
setting=value
setting=

The section names and entries are the same as the long switch names. The part before the dot is the section name, and after the dot is the setting name.

Boolean settings require the ‘=’ with no value after the setting name.

examples

A simple configuration:

[general]
verbose=5
output=simple.out
class=

[site]
domain=example.edu
extension=html
index=index.html
root=~/www/gtdocs

The website for example.edu can be found in ~/www/htdocs. Standard index files are called index.html, and html files always have the html extension.

The configuration outputs errors, warnings and info messages to ssc.out. It analyses class identifiers, so will scan CSS files for class names.

Links and virtual directories

Here’s an example of a site using virtual directories, accompanied by some link checking, including external links and crosslinked ids.

[general]
verbose=5
output=ssc.out
[site]
domain=example.edu
extension=html
index=index.html
root=~/www/htdocs
virtual=/net=tests/virtual

[link]
check=
external=
xlink=

The configuration file specifies a virtual directory. When a link refers to the local directory /net, ssc will seek the corresponding file(s) in tests/virtual (relative to the current directory, not the root).

Microformats

riff on name

Microformats contain machine–readable site information held in class attributes, the kind of thing that cheers up search engines. For more gen, browse their site, microformats.org.

[general]
verbose=5
output=ssc.out
class=

[site]
domain=example.edu
extension=html
index=index.html
root=~/www/htdocs

[microformat]
verify=

The only new thing here is to turn on microformats analysis (class analysis is required).

HTML & SVG

A configuration file to check a site against HTML 5.2 and SVG 1.1 might contain:

[general]
output=site.out
class=

[link]
check=

[site]
domain=example.edu
extension=html
index=index.html
root=site

[html]
version=5.2

[svg]
version=1.1

living standard

A configuration file to check against a particular WhatWG living standard, gathering statistics:

[general]
output=jan21.out

[html]
version=2021/01/01

[link]
check=

[ontology]
schema.org=11.0

[site]
domain=example.edu
extension=html
index=index.html
root=site

[stats]
summary=
meta=

shadow output

riff on name

A configuration file to shadow copy and deduplicate a site might contain:

[general]
output=dedu.out
class=

[site]
domain=example.edu
extension=html
index=index.html
root=site

[shadow]
copy=5
root=shadow
file=dedu.ndx

ontology

A configuration file to export ontologies content from schema.org version 7.2 might contain:

[general]
output=export.out
class=

[site]
domain=example.edu
extension=html
index=index.html
root=site

[link]
check=

[ontology]
export=
root=export
schema.org=7.2

Actually, ssc will report on any microdata it finds, no matter what. Note that, if you use itemref in your pages, ssc may give false warnings in the section referenced by the itemref. This is because ssc does not always know that the referenced data is not intended to be used in its own right, unless you put it under a <TEMPLATE> element.

statistics

To gather some site stats:

[general]
ssi=
verbose=5
output=ssc.out
class=

[site]
domain=example.edu
extension=html
extension=shtml
index=index.html
root=~/www/htdocs

[stats]
selected=
summary=

This turns on summary statistics, which produces a grand total at the end of a complete run (add page= below summary for individual page stats). There are many other stats reports available.

Note also that server side includes have been turned on using general.ssi, and files with the SHTML extension will be treated as web pages.

mathML

riff on name

This configuration file could be used to test MathML 2.

[general]
verbose=5
output=m2.out
class=

[link]
check=

[site]
domain=example.edu
extension=html
index=index.html
root=~/www/htdocs

[math]
version=2

[validation]
citype=function
citype=list
citype=logical
citype=matrix
citype=set
citype=var-x
citype=vector

What’s interesting here is the validation section. The MathML documentation lists a specific set of values that can be used with the TYPE attribute on the CI element. The examples in the specification give this attribute many additional, and apparently illegal, values. (This apparent inconsistency is not uncommon amongst web specifications.) The way to avoid ssc complaining about these extra values is to give CI TYPE additional values in the configuration file, as noted.

Indeed, many enumerated attribute values can be extended in this way. Use the --validation switch to get a complete list.

real world examples

arts & ego

These files are based on the steps I take to update my OpenBSD hosted website, arts & ego.

Presume a directory containing the following:

site.conf ssc configuration file for a website
site shadow output produced by ssc

Then I run a script like this:

ssc -f site.conf
upload.sh site /var/www/site-upload server user 0
ssh user@server “cd /var/www ; mv site x ; mv site-upload site ;
mv x site-upload ; ln -sf site htdocs”

upload.sh is a macos bash script that can be found among the source code. Note that I have rather naughtily replaced OpenBSD’s httpd document directory /var/www/htdocs with a link.

Here is a recent site.conf:

[general]
verbose=info
output=~/www/live.out
ignore=pre
rpt=
progress=
class=
classic=
no-rel=
ssi=

[css]
verify=
extension=css
version=2023++

[html]
version=2023/10/01
title=80
rfc1942=
rfc2070=
ie=
safari=

[link]
no-external=
xlink=
pretend=cgi-bin
local=

[nits]
silence=use_double_quote_code
silence=use_quote_code
silence=missing_itemtype

[shadow]
copy=dedu
root=~/www/live
file=~/www/live.ndx
ignore=inc
info=
msg=arts & ego © 1978-2024 dylan harris

[site]
domain=dylanharris.org
extension=shtml
extension=html
extension=htm
extension=asp
index=index.shtml
root=~/Sites

[spell]
no-check=

[stats]
summary=

[validation]
lang=ma
fontname=Marain
fontname=droid-sans-mono
fontname=ArialMT
fontname=Arial-BoldMT
fontname=Times-Italic
fontname=Times-Roman
fontname=TimesNewRomanPSMT
fontname=TrebuchetMS
fontname=TrebuchetMS-Italic

My site has been built by hand over the decades, and is full of errors. I wrote ssc because I was frustrated that I could find no tool to properly check it. Unfortunately, ssc is too successful: it finds a gadzillion errors in arts & ego. I am slowly making the repairs.

OpenBSD website

riff on name

Perhaps you’ve grabbed a copy of the OpenBSD website from github:

[general]
verbose=info
output=~/obsd.out
git=
progress=
sloven=

[css]
version=2023++
device=3

[html]
version=2023/10/01
title=80
force=

[link]
check=
no-xlink=

[nits]
comment=ftp_protocol

[site]
domain=openbsd.org
root=~/github/www
index=index.html

[stats]
summary=

Dylan Harris
November 2024