Icon The Kermit Project   |   Now hosted by Panix.com
New York City USA   •   kermit@kermitproject.org
…since 1981

HTML...  A text-to-html conversion script

Author:  Frank da Cruz
Script version: 3.02
Dated: 2017/10/04
Platform:  Any version of Unix where C-Kermit is available.
Requires:  C-Kermit 9.0.304 Dev.22 or later
DOWNLOAD

This page last updated: Thu Oct 5 12:41:41 2017

Changes in html script version 3.02

Changes in html script version 3.00

Introduction

Because I have been writing code and prose for so many years, I have countless plain-text files lying around that might be useful to more people if they were on the Web. In 2004 I wrote a Kermit script to convert text files to HTML but it was too ambitious, trying to know things that were unknowable, so the result was often comical. But worse, I noticed that sometimes it lost chunks of text, and other times it failed altogether with some crazy error. That was version 1.00. I put it aside for 13 years.

In April 2017, when I uploaded C-Kermit 9.0.304 Dev.21, I wrote:

For details see the Update Notes file (scroll to the bottom and work your way up to where it says "-- 9.0.302 --" in the August 2011 section; sorry, it's a old-fashioned plain-text file, 9126 lines at last count, converting it to HTML would be an all-day project).
It turns out that some of the problems with the first HTML converter script were in Kermit itself, and this time I tracked them down and fixed them, and then I wrote a new HTML script that is simpler, cleaner, and less ambitious, but also more powerful in some ways. This was version 2.00 of May 1, 2017.

What the html script is

It's a C-Kermit script; that is, a program written in the C-Kermit command language. Presently it runs only on UNIX-based operating systems (if you don't know what that means, click here). You can look at the script by clicking here, and you can read more about Kermit scripts here. The html script requires C-Kermit 9.0.304 Dev.22 or later, because of fixes that were made in that version to correct the problem with missing chunks of text.

How to install the html script

First, you need to have C-Kermit 9.0.304 Dev.22 or later installed on your computer. You can get it here. Then you can download the getkermitscript script, which downloads Kermit scripts from the Kermit Project website and installs them for you.

How to invoke the html script

Assuming you have installed the script on your computer in a directory that is in your Unix PATH, and it has the filename “html”, then you can invoke it like this:
html inputfilename  "pagetitle" outputfilename
That is, the word “html” followed by the name of the text file you want to convert, and then optionally, a title for the page enclosed in doublquotes ("), and a name for the output file. For example:
html notes.txt "My Notes" mynotes.html

If you don't specify a title for the page, the script will use the first line of the file, but only if it is followed by a blank line. If there is no such line, the title will be “Untitled”.

If you don't specify an output filename, the output file will be given the name of the input file, but with an .html exension, for example notes.txt will produce notes.html, and it will be in the same directory as the input file, unless you have defined a destination in your .htmlrc file (explained below). Let's say you do this, and that the text file has a first line suitable as a title; then you can just do:

html notes.txt
and the notes.html file will appear in the directory you indicated in .htmlrc (if any), otherwise in your current directory.

Using the html script in Unix pipelines

The html script can be used in a Unix pipeline (click here to read about pipelines). This is something new for Kermit scripts, it has never been possible before, and it depends on features that were added in C-Kermit 9.0.304 Dev.21 and Dev.22. What it means is that you can "pipe" the output of any Unix command into the html script and it will send the result to a file, to your screen, or to the next program in the pipeline. This is done as follows:
command | html "" "pagetitle" | command
where: If you don't need supply a title, you can simply do:
command | html | command
A more useful tecnique is to redirect html's output to a file:
command | html > outputfilename
Here's a practical example that illustrates how you make a pipeline of Unix commands, each one doing its particular job:
man kermit | col -b | html > kermitmanpage.html
Here we turn the Unix man (manual) page for Kermit into an html document (of course you could do this with any Unix man page). Man pages are generally full of backspacing and overstriking and other special effects; the Unix “col -b” command takes out the special effects, and the result is piped into the html script, whose output is redirected to an html file.

How to customize the html script

Here are the default parameters the html script uses to create html files:
.destination = # Destination directory .cset = utf-8 # Character set of source file (see list) .perms = 644 # Permissions for result .lang = en # Language tag (English) .color = black # Text color .bg = white # Background color .font = sans-serif # Font-family .size = 15px # Font-size (must include units) .margin = 12px # Margins (must include units) .max-width = # Maximum page width (pixels, no default) .noconvertcset = 0 # Set to 1 to suppress character-set conversion
If you put them in your ~/.htmlrc file (that is, a file called .htmlrc in your Unix login directory), you can edit them however you wish; for example to change the font size, to specify the directory name for your website, and so on. The items on the right are comments, they are ignored by the script. The assignments are on the left and have to be as shown: a period followed (with no spacing) by a variable name, a space, an '=' sign, another space, and the value for the variable. If no value is shown above then the item is not used unless you specify a value for it, as is the case for 'destination' and 'max-width'.

In version 3.00 (or later) of the html script, you can also put parameters on the command line after the third parameter as name=value pairs (no dot, no spaces around '='). Command-line parameter settings override .htmlrc ones. Example:

html spaghetti.txt "Spaghetti recipe" spaghetti.html max-width=800px size=14px
Units for size, spacing, margins, etc, must be specified since this is required by HTML5; px is a safe choice.

What the html script does

It reads a plain-text file, which can be in ASCII, ISO 8859-1, Windows Code Page 1252, UTF-8, or other encoding that you specify, and produces an HTML version with approximately the same formatting. Here are the rules: The html script does not attempt to deal with: The script has no way of knowing when it should switch between a proportional font and a fixed-width font to preserve the layout of tables or source code. The assumption is that the author of the plain-text file formatted it in the desired way; the script preserves the original formatting except when a proportional font is used in the html result page. You can override this in your .htmlrc file by specifying a font such as "monospace" or "courier", but this puts the entire page in the given font. Aside from that, there is no way to change fonts within a page.

Finally, the html script puts Top and Bottom anchors and links in the page.

To illustrate, here is a long plain-text file (14 years worth of C-Kermit update notes):

NOTES.TXT
and here is the result of running it through the html script:
ckupdates.html

Improving the results

You can edit any html file produced by this script; for example to put selected parts of text in bold, italics, or monospace, or to add headings, etc. But if you run the html script on the same text file again, your changes will be lost. Therefore the main uses for the html script would be:
  1. To produce Web pages from text files that will not change;

  2. As a first step in migrating a text file to html, in which case the text file will no longer be used and all updates will be made to the html file.

Debugging

To see what the script is doing, put “DEBUG=1” at the beginning of the command line. Example:
DEBUG=1 html notes.txt

  C-Kermit HTML script / kermitproject.org / 1 May 2017 / Most recent update: 5 October 2017