Jester David
|
I've noticed the text in this PDF is a little funky, with invisible spaces in the middle of words.
It doesn't affect reading, but copying text results in funkiness (like a poor OCR scan of a book). And searching for certain phrases doesn't work because the PDF thinks there's a space included. And being able to quickly Ctrl-F and look for a key word or phrase is very handy.
Has anyone else noticed this?
| Chemlak |
| Anguish |
There's a word for it that I can't remember right now
Those are ligatures. I'm not sure where exactly the spaces come into things, but ligatures are "fused" letter combinations such as "fi", where in print the "i" may be tucked under the overhang of the "f" and they become a single character.
Vic Wertz
Chief Technical Officer
|
| 2 people marked this as a favorite. |
Unfortunately, it's not our problem—it's a limitation of the PDF format.
Adobe insists that PDFs are a "destination file format," meaning being able to get information *out* of them is essentially an afterthought.
A really good example of how this impacts things is that they care more about what characters *look like* than they care about what those characters *are*. For example, you and I both believe that the difference between a lowercase letter and a capital letter is important, but when a PDF uses an all-caps font, Adobe has actually thrown away the knowledge of whether a character in that font was a lowercase character or an uppercase character. When you select the text so you can copy and paste it, they actually have to *guess* which it was, and they often guess wrong.
Similarly, when they encode letters like "fl" into a ligature, all they keep track of is where the "fl" glyph goes, and when you select the text (or when you search for a text string), they have to reverse-engineer the characters that make it up. While they seem to always get the characters themselves correct, they frequently screw up the spaces on one side or the other (or both).
To solve either of these problems, Adobe needs to be convinced that their "destination file format" concept is a load of crap that makes their products less useful.