As became clear from from the posts Jekyll Tricks and Dispose of Typo3 I am a strong proponent of static web sites generated with tools.

I’ve used static site generation for over twelve years with Frame4 based tools. 2019 I switched my web sites as well as those I’m in charge of to Jekyll / Liquid.

Hyphenation

A generous dose of conditional or soft hyphens ("­") on the site’s texts can greatly enhance the appearance especially with variable sizes and menus. On the other hand, putting the “­s” in by hand is troublesome and the spoils the readability for the developer or author. Some tool support or automatism would be welcome.

But alas, for many tools and editors, hyphenation is no bright story even when the language is mere English – and Jekyll is no exception here.
This post presents a tool able to add conditional (“soft” ­) breaks to the (markdown) texts and (html) layouts or to remove them for sake of source readability.

All needed is an actual (implementation version >= 1.21.06) Frame4 best as installed extension on a Java8 and an hyphenation definition file in the form of this example, appropriate for your texts’ topic and language.

For all that follows you need the tool de.frame4j.FuR.
If you can start it to see the online help e.g. by one of the following commands you got it perfectly.

java de.frame4j.FuR -help -de # show help in German (de)
java FuR -help -de # show help with comfort starter in English

Definition file

The excerpt of the hyphenation definition file example mentioned above shows the “one word per line with the hyphenation wanted” grammar:

---
# Definitions for [de-]hyphenation by de.frame4j.FuR

# A collection for German language for markdown texts and 
# Jekyll web layouts. This file's encoding is UTF-8.
# Copyright  2021  Albrecht Weinert     a-weinert.de
# $Revision: 58 $ # $Date: 2021-07-08 $)
---
Aus­bil­dungs­tag
Aus­dauer
aus­ge­nom­men
Be­rufs­wahl­messe
Blu­men­ge­stecke
dem­sel­ben
elek­tro­ni­sches
Ent­scheidungen
ent­wor­fen
er­hielt
Er­zie­hungs­be­rech­tigte
...   and so on and so forth

For better readability – and for giving the file to a linguist for checking / correcting – use the tool FuR to replace ­ by -.

D:\eclips...>java FuR hyphDef_de.txt -omitFrntM  "­" "-"
  D:\eclipse18-09WS\web-hansibo\factory\hyphDef_de.txt
  157 occurrences of search texts

The first parameter names the (only one in this case) file to work on, the option -omitFrntM says “Don’t touch a front matter” and the second and third parameter define the pattern to find and its replacement. The result is

---
# Definitions for [de-]hyphenation by de.frame4j.FuR

# A collection for German language for markdown texts and 
# Jekyll web layouts. This file's encoding is UTF-8.
# Copyright  2021  Albrecht Weinert     a-weinert.de
# $Revision: 58 $ # $Date: 2021-07-08 $)
---
Aus-bil-dungs-tag
Aus-dauer
aus-ge-nom-men
Be-rufs-wahl-messe
Blu-men-ge-stecke
dem-sel-ben
elek-tro-ni-sches
Ent-scheidungen
ent-wor-fen
er-hielt
Er-zie-hungs-be-rech-tigte
...   and so on and so forth

After having done the copy editing in this form do not forget to restore the form usable for [de-] hyphenation by

D:\eclips...>java FuR hyphDef_de.txt -omitFrntM  "-" "­"
  D:\eclipse18-09WS\web-hansibo\factory\hyphDef_de.txt
  157 occurrences of search texts

Except for the interchange of search pattern and replacement it is the same as for the other way round. But here the option -omitFrntM is crucial lest to spoil the file’s starting comment.

Hyphenate Jekyll’s markdown

To hyphenate a Jekyll generated web site best go to respective sources root by
    cd /D D:\eclipse18-09WS\web-hansibo\hansiboDE     , e.g.

For an realistic examples sake I assume we want to add hyphenation to all texts (extension .md for markdown) as well as to all Liquid templates (extension .html or .htm as not to confuse them with “real” pure HTML). We want do do this recursively in all sub directories excluding .jekyll-cache, _data and _site.

Step 0: Remove all ­s

Before inserting ­s according to your favourite hyphenation definition file you might remove all ­s randomly inserted by hand. At the root of the site’s Jekyll sources
    java FuR -r .md;.html;.htm -filUTF8 -OmitDirs _site;.jekyll-cache;_data -omitFrntM ­" -v
would do the trick.
FuR: The top most liked / used applications in Frame4 have a starter application in the unnamed package. FuR just delegates to de.frame4j.FuR.
-r .md;.html;.htm -filUTF8 : Recursively (from the current directory) visit all files of the given three extensions and assume their encoding be UTF-8.
-OmitDirs _site;.jekyll-cache;_data : While visiting the sub-directory tree (starting from . ) omit directories + their children with the three names given. -omitFrntM : In the (text) files visited omit a front matter in the process of searching patterns and replacing their occurrences.
"­" (and no further parameter): Search the pattern ­ and replace it by nothing. -v : verbose output (optional). That option list all files visited with the number of finds (and replaces).

Step 1: Hyphenate by definition file

Using the same file and directory criteria as in the “Step 0” example and being in the same directory the command is:
    java FuR -r .md;.html;.htm -filUTF8 -OmitDirs _site;.jekyll-cache;_data -omitFrntM -hyphen ..\factory\hyphDef_de.txt

-hyphen ..\factory\hyphDef_de.txt hyphenate all files in question by the definitions found in the file (hyphDef_de.txt</code) named after the option. all other options and parameters: As explained for "Step 0".

Technically for every line in the hyphenation file FuR generates a search pattern (without the &shy;s) and a replacement (with them). Then in a wide sense the same procedures are taken as when defining multiple search and replace definitions in an extra .properties file – which over the years was FuR’s primary work on servers.

Note 1: At present (19.07.2021) up to 1024 search and replace definitions are generated from the hyphenation definition file (may be risen in future).
Note 2: Those past and present n patterns * m files task with those numbers in order of 100s and 1000s would not go well with Java’s standard String search. As other Frame4 tools de.frame4j.FuR uses Frame4’s implementation of the Rabin-Karp algorithms. Rabin-Karp brings the search from O(t*s) (naive String.indexOf() and consorts) to less than O(t), where t is the length of the text and s is length of the substring to spot. Note 3: Step 1 is the main step – the one where all this was done for. We generate and (ftp) deploy web sites by Jekyll on the SVN (subversion) server in a post commit hook. And we very well may put FuR -hyphen ... there.

Step 2: De-Hyphenate by definition file

With a command very similar to the one in “Step 1” FuR can remove (only) the hyphenations applied:
    java FuR -r .md;.html;.htm -filUTF8 -OmitDirs _site;.jekyll-cache;_data -omitFrntM -dehyphen ..\factory\hyphDef_de.txt

-dehyphen ..\factory\hyphDef_de.txt de-hyphenate all files in question by the definitions found in the file (hyphDef_de.txt) named after the option. The procedure is the same as explained ion “Step 1” except for the reversal of the find pattern and the replacement.
all other options and parameters: As explained for “Step 0”.

Applying “Step 1” to a set of “hyphen-less” (by “Step 0”, e.g.) files and then applying “Step 2” should bring the set of files respectively texts back in the previous state – or in other word one operation is the inverse of the other.
Note 4: Be aware of shadowing effects especially with compound words like (German) runter, gekommen und runtergekommen that would be defined as e.g.

run&shy;ter&shy;ge&shy;kom&shy;en
run&shy;ter
ge&shy;kom&shy;en
kom&shy;en

If you change the order here by, e.g., setting "kom&shy;men" (kommen) before the others, this would inhibit the extra hyphenations of gekommen and runtergekommen.
Note 5: As de.frame4j.FuR reverses the order of find and replace on -dehyphen to the one used for -hyphen the “inverse operation” property should hold. Note 6: When writing or editing a hyphenation definition file always have -hyphen in mind. Rule of thump: compounds and longer words one gets by adding to shorter ones first.

Mathematical question to the learned readers:
Would just length on the non hyphenated word be the right sorting criterion?

Well, as I think the the answer is Yes. Hence, since 31.07.2021 (resp. Implementation-Version 1.21.07) FuR will sort the hyphenation definitions accordingly before use.
Note 7: If you use this Frame4 version or a newer one (and don’t switch off this sorting) you may ignore Note 4 and 6.

Happy automatic hyphenation.