
![]() |

I'm sure this has been asked before, but are the PDFs of old WotC products plain scans or do they include OCRed text? Would I be able to cut and paste text from them? I'm specifically looking at the late 2nd Edition Greyhawk products. Thanks.
Most are OCRd, and are thus copy-and-pasteable—but some are better than others. (Some were actually generated from layout files, so they're pretty much perfect.)
-Vic.
.

priam |

Most are OCRd, and are thus copy-and-pasteable—but some are better than others. (Some were actually generated from layout files, so they're pretty much perfect.)
Is there a list somewhere of which products have been OCRd? Also is there a list of which products will be OCRd in the future? Can requests be made for certain products to be OCRd?
Thank you.

![]() |

Is there a list somewhere of which products have been OCRd? Also is there a list of which products will be OCRd in the future? Can requests be made for certain products to be OCRd?
We don't have a list of how any particular products were created; the only way I can think of to compile such a list would be to open them all and take a guess.
As for future OCR upgrades, I can tell you that it's not as simple as just running the current scans through the Acrobat Pro OCR function—many of the scans don't have the appropriate resolution or contrast; I suspect that they'd have to be entirely redone. We're not making those sorts of changes to Wizards' products; neither are we aware of plans Wizards (or anyone else) might be making to do that.
-Vic.
.

priam |

We don't have a list of how any particular products were created; the only way I can think of to compile such a list would be to open them all and take a guess.
Thanks for the reply. I guess one solution is to ask around and hope someone who bought the product can give a heads up or down whether a product is OCRed.
Printing OCR pdfs is a lot easier and ink-friendlier than the non-OCR pdfs.

![]() |

I guess one solution is to ask around and hope someone who bought the product can give a heads up or down whether a product is OCRed.
If you have specific products you're curious about, I can check 'em.
Printing OCR pdfs is a lot easier and ink-friendlier than the non-OCR pdfs.
That shouldn't make any difference—the OCR text layer is an invisible text layer that's aligned with the scanned page, so OCR or no, you're just printing the scanned image.
A bigger factor on ink use will be the white balance of the scan—and many of the D&D PDFs have lousy white balance (meaning what's supposed to be white is a bit grey).

priam |

That shouldn't make any difference—the OCR text layer is an invisible text layer that's aligned with the scanned page, so OCR or no, you're just printing the scanned image.
A bigger factor on ink use will be the white balance of the scan—and many of the D&D PDFs have lousy white balance (meaning what's supposed to be white is a bit grey).
Yes, but you can copy/paste the text into a word processing program and save ink by not printing superfluous pictures.

priam |

Ok, my last post must not have gone.
I've found a way to print readable pdfs.
First, I convert the pdf to image files (bmp, tif, etc.)
Second, I convert the image files to black/white.
This really helps when you have pages with a grey background on black text. The grey background is converted to a white background, which makes the readable printed pages.
If there's interest, I could post more details on the process.

![]() |

Vic Wertz wrote:I would like to know if Oriental Adventures (for first edition AD&D) has OCR or not. Thanks!
If you have specific products you're curious about, I can check 'em.
It has been OCR'd, but the scans are not great, so the OCR is poor, and will require editing to be at all useful. An excerpt:
"Every character in an Oliental ADdP adventure must be a member of
a class. A character's class is analogous to hls profession; it provides a
niche for the character to operate wcthln. Players have 10 classes to
choose from: barbarian, bushi. kensai, monk. ninja. samurai. shukenja.
sonei. wu jen, and yakuza"

Sean Mahoney |

First, I convert the pdf to image files (bmp, tif, etc.)
Second, I convert the image files to black/white.
This really helps when you have pages with a grey background on black text. The grey background is converted to a white background, which makes the readable printed pages.
The image files imbedded are already image files, so converting just insures that no text can print as such. The process above SHOULD give the same output as simply changing to greyscale mode in your print preferences.
The exception to this would be if you are indicating that these should be changed to Bitmap mode (only save black information, so anything that becomes white is actually just a non-printing space and all other areas are pure black... depending on the setting this gives you pics that are similar to how a Newspaper prints picture.. sort of pointelation). If you do this then you definately want to do two things. Interpolate the image to a higher resoltion (600 is good... normally you wouldn't do this, but it works great in this process) and then save the file as a TIF/tiff with LZW compression. These files can print at extremely high resolution and look really good, but are still very small files. Other interesting info on them... they have transparent backgrounds if 'placed' in a graphic layout program (the white areas are just non-printing) and those same programs can change their color very nicely. I use this method quite a bit for other things... including the WotC scan PDF I created for them during that whole project.
Sean Mahoney