'Creating one-page PDF with fit-to-page text
I am looking for a way to generate single-page PDF files from text of arbitrary length, auto fit-to-page font size, with reasonable margins, centered H/W.
command --text="Text of arbitrary length" --output=one-page-file.pdf
That is, I want to re-create
magick -gravity center -background white -fill black -size 1728x972 -font /Users/marekkowalczyk/Library/Fonts/RobotoMono-Medium.ttf caption:"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat." -background white -extent 1920x1080 long.pdf
where the output file is a "true" PDF, not an image file embedded in a PDF --- obviously substituting ImageMagick with a tool that generates PDF (PostScript? TeX?).
Solution 1:[1]
Update 2022-03-03:
In onepg.sh a - (dash) has been added after paste -s -d ' ' to
specify stdin as input.
If ghostscript says "Could not open the file /dev/stdout" I suggest
editing onepg.sh as follows: change /dev/stdout to %stdout,
/dev/stderr to %stderr, and -f /dev/stdin to -f - (but leave
: ${infile='/dev/stdin'} as is). (end update)
It's been a while since this question was asked. Nevertheless...
Here's a PostScript program (onepg.ps) and a POSIX shell script
(onepg.sh) using ghostscript 9.50 to create a one-page PDF
adjusting font size to fill the page. Run as:
echo 'BZZZT ~ Train leaving in 45 minutes' | ./onepg.sh > bzzzt.pdf
or, to convert a Central or Eastern European plaintext file to a PDF with blue text,
tocode=latin2 rgbtext='0 0 255' ./onepg.sh < some.txt > some.pdf
or, for a compact standalone PostScript file and a trace log file,
TRACE=x logfile=file.log psoutfile=file.ps outfile=file.null ./onepg.sh < file.txt
or, for an A5-size PNG file in landscape orientation,
PAPERSIZE=a5 landscape=x outfile=file.png ./onepg.sh < file.txt
The driver shell script
- supplies default values for files, encoding, page size and margins, font etc.
- converts and formats the input text - which may contain
~(tilde blank) as section delimiter - using the standard toolsiconvandsed - emits PostScript startup code to call the
vert-centrprocedure inonepg.ps - invokes
ghostscriptto produce the output file; default format is PDF - uses shell parameter expansion (documented here)
Caution: Long words in a very short text may be truncated due to enlarged font size.
I should mention that I do PostScript once in a purple moon.
For a one-page output efficiency isn't a big concern and the algorithm
used is quite simple. The vert-centr proc invokes adjustfont which
computes the font size so the text fills the artbox (extent of the
page's meaningful content) by repeatedly calling linebreakr in
a divide-and-conquer approach. It stops when the line count equals
floor(artbox height / font size) or when the computed font size no
longer changes. Finally vert-centr displays the page distributing
excess vertical whitespace evenly between lines and centreing lines
horizontally; no other formatting is done.
The encodefont proc supports ASCII (StandardEncoding), Latin-1, and
Latin-2. Input text is converted by
iconv's
--to-code="...//TRANSLIT" and so may not be represented accurately.
//TRANSLIT is convenient for UTF-8 input but leaves ? in the output
if transliteration cannot be done.
If the onepg.sh script is invoked with a non-empty TRACE shell
variable the artbox is outlined in the output file and following
written to a trace log (stderr, by default):
- artbox dimensions
- table of the font size computation:
- font size min:max
- current font size
- line count
- floor(artbox height / font size)
stringwidthof text in current font
- Y coordinates of each line
Sample trace log:
artbox: x=71 y=67 w=452 h=707 x+w=523 y+h=774
szrg ftsz lnct h/ftsz textw
6:144 75 69 9 23870
6:75 40 34 17 12730
6:40 23 19 30 7320
23:40 31 26 22 9866
23:31 27 23 26 8593
27:31 29 24 24 9229
lnypos: 749.0 719.542 690.083 660.625 631.167 601.708 572.25 542.792 513.333 483.875 454.417 424.958 395.5 366.042 336.583 307.125 277.667 248.208 218.75 189.292 159.833 130.375 100.917 71.4584
File: onepg.ps
% onepg.ps -- convert text to fit one page, adapting font
%
% Notes:
% - invoke with accompanying POSIX shell script onepg.sh
% - intended for one-page texts, not for extreme-size texts or words
% - supports section breaks, e.g. (end para.~ Next para), see /SECT
% NB: section delimiter must be followed by a word delimiter (blank)
% - /fsMin, /fsMax font sizes are defined in /adjustfont
% - for StandardEncoding /encodefont is not needed
% - use Latin-2 encoding vector for ISO 8859-2 compatibility
% - tested with ghostscript 9.50, evince 3.36.7, okular 1.9.3
/TRACE false def % trace info flag
/SECT (~) 0 get def % section delimiter char (use 7bit ascii)
/Trace { % (string) --> ...
TRACE { print flush } if
} bind def
/strN { % any --> (string)
32 string cvs
} bind def
% Concatenate N strings.
% (s1) (s2) (s3) ... (sN) n --> (s1s2s3...sN)
% origin: https://stackoverflow.com/a/12472783 (with comments)
/ncat {
dup 1 add
copy
0 exch { exch length add } repeat
string exch
0 exch
-1 1 {
2 add -1 roll
3 copy putinterval
length add
} for
pop
} def
% Split text into lines, call back for each, return line count.
% NB: newlines get no special treatment (so replace with word delimiter)
% stack: text word-delimiter maxwidth eolproc(lntext,lnwidth) --> lnct
/linebreakr {
0 begin
/eolproc exch def
/maxlinewidth exch cvr def
/delim exch def
/qtxt exch def % queued text
/qtxtlen qtxt length def
/qtxtlnct 0 def
/delimlen delim length def
/delimwd delim stringwidth pop def
{
qtxtlen 0 le { exit } if
/qtxtlnct qtxtlnct 1 add def
/lntxt qtxt def % rest of current line
/lnlen 0 def
/lnwidth 0.0 def
{ % process current line
% string seek <search> post match pre true
% string seek <search> string false
lntxt delim search % look for next delimiter
/inq exch def % queue not empty if found
/nextword exch def
/nextwordlen nextword length def
inq { pop /lntxt exch def } if
/atsect 0 def % SECT at end of nextword?
nextwordlen 0 ne { % if
nextword nextwordlen 1 sub get SECT eq { % if
/atsect 1 def
/qtxtlnct qtxtlnct 1 add def
/nextword nextword 0 nextwordlen 1 sub getinterval def
} if
} if
% at end of line if passing max unless no words
% seen, in which case truncating a rather long word,
% cf. https://en.wikipedia.org/wiki/Longest_words
/wordwidth nextword stringwidth pop def
lnwidth wordwidth add maxlinewidth gt lnlen 0 gt and {
exit % FIXME: better to add delimwd before exit
} if
/lnwidth lnwidth wordwidth add delimwd add def
inq not atsect 0 ne or {
/lnlen lnlen nextwordlen add def
exit
} if
/lnlen lnlen nextwordlen add delimlen add def
} loop % line
% call back line+width
qtxt 0 lnlen atsect sub getinterval lnwidth delimwd sub eolproc
atsect 0 ne { () 0.0 eolproc } if % call back linefeed
% skip to next line
/qtxtlen qtxtlen lnlen sub def
/qtxt qtxt lnlen qtxtlen getinterval def
} loop % text
qtxtlnct % return line count
end % dict
} def
/linebreakr load 0 16 dict put
% Adjust font size to fill artbox by repeatedly calling linebreakr.
% stack: fontname artbox text word-delimiter --> fontsize linect
%
% Returns when linect == floor(artbox-height / fontsize)
% or when fontsize no longer changes after call to linebreakr.
%
% Detects and avoids oscillation as in:
% height fontsize quotient linect
% 708 26 27.2 26
% 708 27 26.2 27
/adjustfont {
0 begin
/worddelim exch def
/pgtext exch def
/artbox exch def
/fontname exch def
/fsMin 6 def
/fsMax 144 def
/fontsize 1 def
/ABX artbox 0 get def
/ABY artbox 1 get def
/ABW artbox 2 get def
/ABH artbox 3 get def
TRACE { % if
% outline rectangle where text goes
gsave
.82 setgray artbox rectstroke
grestore
% artbox coords
% ... N ncat
(artbox:)
( x=) ABX strN
( y=) ABY strN
( w=) ABW strN
( h=) ABH strN
( x+w=) ABX ABW add strN
( y+h=) ABY ABH add strN
(\n)
14 ncat Trace
% fontsize computation table header
(szrg\tftsz\tlnct\th/ftsz\ttextw\n) Trace
} if
{ % loop
/lastfs fontsize def
% prefer smaller font size (using idiv)
/fontsize fsMin fsMax add 2 idiv def
fontname fontsize selectfont
% count lines by splitting text using current font
pgtext worddelim ABW { pop pop } linebreakr
/linect exch def
/lineqt ABH fontsize idiv def % floor(ABH / fontsize)
TRACE { % if
% fontsize computation table row
/textwd pgtext stringwidth pop def % width in current font
% ... N ncat
fsMin strN (:) fsMax strN
(\t) fontsize strN
(\t) linect strN
(\t) lineqt strN
(\t) textwd cvi strN
(\n)
12 ncat Trace
} if
lineqt linect sub
dup 0 eq % success
fontsize lastfs eq or % guard against infinite loop
{ pop exit } if
0 lt { /fsMax fontsize def
}{ /fsMin fontsize def
} ifelse
} loop
fontsize linect % return values
end % dict
} def
/adjustfont load 0 16 dict put
% Encode named font: fontname encid --> encfontname
% where encid is 0 StandardEncoding, 1 Latin-1, or 2 Latin-2
% e.g. /Helvetica 1 --> /encft1Helvetica
% origin of /encvec table: https://stackoverflow.com/a/14866794
/encodefont {
0 begin
/encid exch def
/fontnm exch def
/myfontnm {
(encft)
encid strN
fontnm 64 string cvs
3 ncat
} def
/encvec encid 1 eq
{ ISOLatin1Encoding }
{ StandardEncoding } ifelse
def
encid 2 eq { % if
/encvec
% Latin-2: first 144 entries same as in ISO Latin-1
ISOLatin1Encoding 0 144 getinterval aload pop
% \22x
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
% \24x
/nbspace /Aogonek /breve /Lslash /currency /Lcaron /Sacute /section
/dieresis /Scaron /Scedilla /Tcaron /Zacute /hyphen /Zcaron /Zdotaccent
/degree /aogonek /ogonek /lslash /acute /lcaron /sacute /caron
/cedilla /scaron /scedilla /tcaron /zacute /hungarumlaut /zcaron /zdotaccent
% \30x
/Racute /Aacute /Acircumflex /Abreve /Adieresis /Lacute /Cacute /Ccedilla
/Ccaron /Eacute /Eogonek /Edieresis /Ecaron /Iacute /Icircumflex /Dcaron
/Dcroat /Nacute /Ncaron /Oacute /Ocircumflex /Ohungarumlaut /Odieresis /multiply
/Rcaron /Uring /Uacute /Uhungarumlaut /Udieresis /Yacute /Tcedilla /germandbls
% \34x
/racute /aacute /acircumflex /abreve /adieresis /lacute /cacute /ccedilla
/ccaron /eacute /eogonek /edieresis /ecaron /iacute /icircumflex /dcaron
/dcroat /nacute /ncaron /oacute /ocircumflex /ohungarumlaut /odieresis /divide
/rcaron /uring /uacute /uhungarumlaut /udieresis /yacute /tcedilla /dotaccent
256 packedarray def
} if
fontnm findfont % load the font
0 dict copy begin % copy it to a new dictionary
/Encoding encvec def % replace encoding vector
myfontnm /FontName def % replace font name
currentdict end
dup /FID undef % remove internal data
myfontnm exch definefont pop % define the new font
myfontnm % return value
end % dict
} def
/encodefont load 0 4 dict put
% Justify text vertically by adjusting font size, centre horizontally,
% and display in artbox.
% stack: pagetext fontname mediabox artbox rgbtext rgbbkg --> ...
/vert-centr {
15 dict begin
/rgbbkg exch def
/rgbtext exch def
/artbox exch def
/mediabox exch def
/fontname exch def
/pgtext exch def
/worddelim ( ) def
/ABX artbox 0 get def
/ABY artbox 1 get def
/ABW artbox 2 get def
/ABH artbox 3 get def
rgbbkg {255 div} forall setrgbcolor mediabox rectfill
rgbtext {255 div} forall setrgbcolor
% adjust font size, select font, centre text vertically
fontname artbox pgtext worddelim adjustfont
/lnct exch def
/fontsize exch def
/lnyadj ABH fontsize lnct mul sub lnct div def % even out excess
/lnypos ABH ABY add lnyadj add 4 add cvr def % +4 looks better
(lnypos:) Trace
% split text into lines and display
% args: pagetext delimiter maxlinewidth eolproc
pgtext worddelim ABW {
% eolproc: linetext linewidth --> ...
/lnypos lnypos fontsize sub lnyadj sub def
ABX lnypos cvi moveto
% centre text horizontally
ABW sub -2 div 0 rmoveto show
( ) Trace lnypos strN Trace
} linebreakr pop
(\n) Trace
showpage
end % dict
} def
% ---- startup code here ----
Sample startup code:
%%Page: 1 1
/TRACE true def
(OBS! ~ Tåg till Göteborg avgår inom fyrtiofem minuter)
/Helvetica 1 encodefont
[0 0 595 842] [71 67 453 708 ] [0 0 0] [252 250 243]
vert-centr
%%Trailer
File: onepg.sh
#! /bin/sh
# Use ghostscript 9.50 to run onepg.ps
# e.g.
# echo 'Train in 45 min' | ./onepg.sh > msg.pdf
# tocode=latin2 rgbtext='0 0 255' ./onepg.sh < some.txt > some.pdf
# TRACE=x logfile=file.log outfile=file.pdf ./onepg.sh < file.txt
# infile=the.txt outfile=the.png devWpts=1600 devHpts=900 ./onepg.sh
# shellcheck disable=SC2223,SC2046,SC2086
## Set default values
: ${progps='./onepg.ps'} ## PostScript program file
: ${TRACE=} ## non-empty to trace to ${logfile}
: ${psoutfile=} ## non-empty to emit raw PostScript
: ${infile='/dev/stdin'} ## source text
: ${outfile='/dev/stdout'} ## destination, e.g. my.pdf or my.ps
: ${logfile='/dev/stderr'} ## e.g. my.trace.log or %stderr
: ${fromcode='UTF-8'} ## encoding of ${infile}
: ${tocode='ASCII'} ## ASCII | LATIN1 | LATIN2
: ${PAPERSIZE='a4'} ## see `man paperconf`
: ${marginx=.12} ${marginy=.08} ## page margins (.08 = 8%)
: ${landscape=} ## non-empty for landscape orientation
: ${fontname='Helvetica'} ## font name
: ${rgbtext='0 0 0'} ## text colour RGB
: ${rgbbkg='252 250 243'} ## background colour RGB
## Set up arguments
case ${tocode} in
(LATIN2|latin2) encid=2 tocode='LATIN2//TRANSLIT' ;;
(LATIN1|latin1) encid=1 tocode='LATIN1//TRANSLIT' ;;
(ASCII|ascii|*) encid=0 tocode='ASCII//TRANSLIT' ;;
esac
case ${outfile} in
(*.jpeg) gsdevice='jpeg' ;;
(*.null) gsdevice='nullpage' ;;
(*.pdf) gsdevice='pdfwrite' ;;
(*.png) gsdevice='png16m' ;;
(*.ps) gsdevice='ps2write' ;;
(*.txt) gsdevice='txtwrite' ;;
(*) gsdevice='pdfwrite' ;;
esac
case ${PAPERSIZE} in
## portrait mode width and height dimensions in points
(letter)
: ${devWpts=612} ${devHpts=792} ;;
(a5) : ${devWpts=420} ${devHpts=595} ;;
(a4) : ${devWpts=595} ${devHpts=842} ;;
(a3) : ${devWpts=842} ${devHpts=1191} ;;
(*) if test -z "${devWpts}"; then
set -- $(LC_NUMERIC=C printf '%.0f ' \
$(paperconf -p "${PAPERSIZE}" -w -h))
devWpts="$1" devHpts="$2"
fi ;;
esac
if test "${landscape}"
then _tmp="${devWpts}" devWpts="${devHpts}" devHpts="${_tmp}"
_tmp="${marginx}" marginx="${marginy}" marginy="${_tmp}"
fi
mediabox2artbox() { ## x=$1 y=$2 w=$3 h=$4
set -- "$3*$marginx" "$4*$marginy" "$3-$3*$marginx*2" "$4-$4*$marginy*2"
printf '(%s+0.5)/1\n' "$@" | bc | paste -s -d ' ' -
}
: ${mediabox="0 0 ${devWpts} ${devHpts}"} ## x y width height
: ${artbox="$(mediabox2artbox ${mediabox})"} ## same, within margins
## Emit PostScript, run ghostscript
{ cat << ENDCMT
%!PS-Adobe-2.0
%%BoundingBox: ${mediabox}
%%Creator: ${0##*/}
%%Pages: 1
%%Title: ${infile%.*}
%%EndComments
ENDCMT
## copy program stripping non-DSC comments and indentation
sed -e '/^%%/! s/[[:blank:]]*%[^%]*$//' \
-e 's/^[[:blank:]]*//' -e '/./!d' "${progps}"
## startup code
cat << HERE
%%Page: 1 1
${TRACE:+/TRACE true def}
HERE
## convert text to 8-bit PostScript string,
## escape backslashes, paren:s, and newlines, enclose in paren:s
iconv -f "${fromcode}" -t "${tocode}" < "${infile}" |
sed -e 's/[\\()]/\\&/g' -e '$!s/$/\\/' -e '1s/^/(/' -e '$s/$/)/'
cat << ENDPS
/${fontname} ${encid} encodefont
[${mediabox}] [${artbox}] [${rgbtext}] [${rgbbkg}]
vert-centr
%%Trailer
ENDPS
} |
tee ${psoutfile:+"${psoutfile}"} |
gs -q -dBATCH -dNOPAUSE \
-dDEVICEWIDTHPOINTS="${devWpts}" \
-dDEVICEHEIGHTPOINTS="${devHpts}" \
-sDEVICE="${gsdevice}" \
-sOutputFile="${outfile}" \
${logfile:+-sstdout="${logfile}"} \
-f /dev/stdin
Solution 2:[2]
I came up with the following hack.
- Create an
SVGvector image.
convert\
-gravity\
center\
-background\
white\
-fill\
black\
-size\
1728x972\
-extent\
1920x1080\
-font ~/Library/Fonts/RobotoMono-Medium.ttf\
caption:"Lorem ipsum"\
lorem.svg
- Convert it to a vector
PDF. Note that generating aPDFdirectly withconvertdoesn't work because the file would just be an embedded bitmap image.
svg2pdf lorem.svg lorem.pdf
- Use
ocrmypdfto add a layer of text. This step is necessary because thePDFfrom the previous step is just a vector image of letter shapes, unlike aPDFrendered byLaTeXetc.
ocrmypdf -l pol+eng --output-type pdfa --clean lorem.pdf lorem-ocr.pdf
Hacky as hell but gets the job done.
The proper solution would involve somehow accessing the ImageMagick internal layout engine and capturing its output before it's converted into a bitmap.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Marek Kowalczyk |

