I’ve used static site generation for over twelve years with Frame4 based tools. 2019 I switched my web sites as well as those I’m in charge of to Jekyll / Liquid.
A generous dose of conditional or soft hyphens (
on the site’s texts can greatly enhance the appearance especially with
variable sizes and menus. On the other hand, putting the “­s” in
by hand is troublesome and the spoils the readability for the
developer or author. Some tool support or automatism would be welcome.
But alas, for many tools and editors, hyphenation is no bright story
even when the language is mere English – and Jekyll is no exception here.
This post presents a tool able to add conditional (“soft” ­) breaks to the (markdown) texts and (html) layouts or to remove them for sake of source readability.
All needed is an actual (implementation version >= 1.21.06) Frame4 best as installed extension on a Java8 and an hyphenation definition file in the form of this example, appropriate for your texts’ topic and language.
For all that follows you need the tool
If you can start it to see the online help e.g. by one of the following commands you got it perfectly.
java de.frame4j.FuR -help -de # show help in German (de) java FuR -help -de # show help with comfort starter in English
The excerpt of the hyphenation definition file example mentioned above shows the “one word per line with the hyphenation wanted” grammar:
--- # Definitions for [de-]hyphenation by de.frame4j.FuR # A collection for German language for markdown texts and # Jekyll web layouts. This file's encoding is UTF-8. # Copyright 2021 Albrecht Weinert a-weinert.de # $Revision: 58 $ # $Date: 2021-07-08 $) --- Aus­bil­dungs­tag Aus­dauer aus­ge­nom­men Be­rufs­wahl­messe Blu­men­ge­stecke dem­sel­ben elek­tro­ni­sches Ent­scheidungen ent­wor­fen er­hielt Er­zie­hungs­be­rech­tigte ... and so on and so forth
For better readability – and for giving the file to a linguist for
checking / correcting – use the tool FuR to replace
D:\eclips...>java FuR hyphDef_de.txt -omitFrntM "­" "-" D:\eclipse18-09WS\web-hansibo\factory\hyphDef_de.txt 157 occurrences of search texts
The first parameter names the (only one in this case) file to work on, the
-omitFrntM says “Don’t touch a front matter” and the
second and third parameter define the pattern to find and its replacement.
The result is
--- # Definitions for [de-]hyphenation by de.frame4j.FuR # A collection for German language for markdown texts and # Jekyll web layouts. This file's encoding is UTF-8. # Copyright 2021 Albrecht Weinert a-weinert.de # $Revision: 58 $ # $Date: 2021-07-08 $) --- Aus-bil-dungs-tag Aus-dauer aus-ge-nom-men Be-rufs-wahl-messe Blu-men-ge-stecke dem-sel-ben elek-tro-ni-sches Ent-scheidungen ent-wor-fen er-hielt Er-zie-hungs-be-rech-tigte ... and so on and so forth
After having done the copy editing in this form do not forget to restore the form usable for [de-] hyphenation by
D:\eclips...>java FuR hyphDef_de.txt -omitFrntM "-" "­" D:\eclipse18-09WS\web-hansibo\factory\hyphDef_de.txt 157 occurrences of search texts
Except for the interchange of search pattern and replacement it is the same
as for the other way round. But here the option
crucial lest to spoil the file’s starting comment.
Hyphenate Jekyll’s markdown
To hyphenate a Jekyll generated web site best go to respective sources
cd /D D:\eclipse18-09WS\web-hansibo\hansiboDE
For an realistic examples sake I assume we want to add hyphenation to
all texts (extension .md for markdown) as well as to all Liquid templates
(extension .html or .htm as not to confuse them with “real” pure HTML).
We want do do this recursively in all sub directories excluding
Step 0: Remove all ­s
Before inserting ­s according to your favourite hyphenation
definition file you might remove all ­s randomly inserted by hand.
At the root of the site’s Jekyll sources
java FuR -r .md;.html;.htm -filUTF8 -OmitDirs _site;.jekyll-cache;_data -omitFrntM ­
would do the trick.
FuR: The top most liked / used applications in
Frame4 have a
starter application in the unnamed package.
FuR just delegates
-r .md;.html;.htm -filUTF8 : Recursively (from the
current directory) visit all files of the given three extensions and assume
their encoding be UTF-8.
-OmitDirs _site;.jekyll-cache;_data : While visiting the
sub-directory tree (starting from . ) omit directories + their children
with the three names given.
-omitFrntM : In the (text) files visited omit a front matter in
the process of searching patterns and replacing their occurrences.
"­" (and no further parameter): Search the
­ and replace it by nothing.
-v : verbose output (optional). That option list all files
visited with the number of finds (and replaces).
Step 1: Hyphenate by definition file
Using the same file and directory criteria as in the “Step 0” example
and being in the same directory the command is:
java FuR -r .md;.html;.htm -filUTF8 -OmitDirs _site;.jekyll-cache;_data -omitFrntM -hyphen ..\factory\hyphDef_de.txt
-hyphen ..\factory\hyphDef_de.txt hyphenate all files in
question by the definitions found in the file (
named after the option.
all other options and parameters: As explained for "Step 0".
Technically for every line in the hyphenation file FuR generates a
search pattern (without the
­s) and a replacement (with
them). Then in a wide sense the same procedures are taken as when defining
multiple search and replace definitions in an extra .properties file – which
over the years was FuR’s primary work on servers.
Note 1: At present (19.07.2021)
up to 1024 search and replace definitions are generated from the hyphenation
definition file (may be risen in future).
Note 2: Those past and present n patterns * m files task with those numbers in order of 100s and 1000s would not go well with Java’s standard String search. As other Frame4 tools de.frame4j.FuR uses Frame4’s implementation of the Rabin-Karp algorithms. Rabin-Karp brings the search from O(t*s) (naive String.indexOf() and consorts) to less than O(t), where t is the length of the text and s is length of the substring to spot. Note 3: Step 1 is the main step – the one where all this was done for. We generate and (ftp) deploy web sites by Jekyll on the SVN (subversion) server in a post commit hook. And we very well may put
FuR -hyphen ...
Step 2: De-Hyphenate by definition file
With a command very similar to the one in “Step 1” FuR can remove (only)
the hyphenations applied:
java FuR -r .md;.html;.htm -filUTF8 -OmitDirs _site;.jekyll-cache;_data -omitFrntM -dehyphen ..\factory\hyphDef_de.txt
-dehyphen ..\factory\hyphDef_de.txt de-hyphenate all files in
question by the definitions found in the file (
named after the option. The procedure is the same as explained ion “Step 1”
except for the reversal of the find pattern and the replacement.
all other options and parameters: As explained for “Step 0”.
Applying “Step 1” to a set of “hyphen-less” (by “Step 0”, e.g.) files and
then applying “Step 2” should bring the set of files respectively texts
back in the previous state – or in other word one operation is the inverse
of the other.
Note 4: Be aware of shadowing effects especially with compound words like (German) runter, gekommen und runtergekommen that would be defined as e.g.
run­ter­ge­kom­en run­ter ge­kom­en kom­en
If you change the order here by, e.g., setting
(kommen) before the others, this would inhibit the extra hyphenations
of gekommen and runtergekommen.
Note 5: As de.frame4j.FuR reverses the order of find and replace on
-dehyphen to the one
-hyphen the “inverse operation” property should hold.
Note 6: When writing or editing a hyphenation definition file always have
-hyphen in mind. Rule of thump: compounds and longer words
one gets by adding to shorter ones first.
Mathematical question to the learned readers:
Would just length on the non hyphenated word be the right sorting criterion?
Well, as I think the the answer is Yes. Hence, since 31.07.2021 (resp.
Implementation-Version 1.21.07) FuR will sort the hyphenation definitions
accordingly before use.
Note 7: If you use this Frame4 version or a newer one (and don’t switch off this sorting) you may ignore Note 4 and 6.
Happy automatic hyphenation.
CommentsWant to leave a comment? Visit this post's issue page on GitHub.
For commenting you will need a GitHub account.