Black Backgrounds in images from PDF files


Website Feedback


After trying to copy pawn images from the PDF file and getting black backgrounds, I found a few old threads about the issue.

http://paizo.com/threads/rzs2q50z?Black-Background-in-Copy-and-Pasted-Image s
http://paizo.com/threads/rzs2lvmp?Help-with-copying-images-out-of-PDFs
http://paizo.com/threads/rzs2nff1?Black-background-problem

While a couple of the suggestions did sort of work, they weren't perfect and I couldn't find a satisfying explanation as to why. It seemed more like some arcane ritual than a technological operation. I did some more digging, and I thought I'd post what I found in case someone finds it helpful. Maybe this is all well-known, but it didn't pop right out in my search.

First, I found the following post that states that images in a PDF file do not have an alpha channel. Instead, a second image is used as a mask. The linked specification document was too dense for me, but this would make sense. If some programs copy the image without merging the mask, they would lose the alpha channel. This would also explain why extraction with pdfimages outputs two files for an image.

https://groups.google.com/forum/?_escaped_fragment_=topic/pdfnet-sdk/H8n_aN JLtNc#!topic/pdfnet-sdk/H8n_aNJLtNc

Helpfully, someone posted a script for extracting and appying an alpha mask image.

https://gist.github.com/innermond/d110d6234123bf87cc04

The script makes some assumptions that won't work with Paizo's PDF files, but the key commands I adopted are:

Quote:


pdfimages -png -p <pdf file> <base name>
convert <image> <mask> -compose CopyOpacity -composite <new image>

The result is an image with a proper transparent background. I'm running these under Linux, but any program that extracts the image and mask can take the place of pdfimages. Convert is a command line tool from imagemagick, but I imagine you could do the same thing manually using Photoshop or Gimp. Obviously I can't post any examples of the resulting image, but hopefully this is explicit enough for people to follow.

Here are a couple of quick scripts I threw together using those commands. Sorry about the formatting, but it looks like Paizo doesn't accept [code] tags.
This one takes in the name of a pdf file and extracts the images into separate directories for each page:

Quote:


#!/bin/bash
pdfFile=$1
if ! [[ -f $pdfFile ]]; then
echo "$pdfFile is not valid"
exit 1
fi
numPages=$(pdfinfo "$pdfFile" | grep Pages | awk '{print $2}')
if [[ -e images ]]; then
if ! [[ -d images ]]; then
echo "images" must be a directory
exit 1
fi
else
mkdir images
fi
cd images
for ((page=1; page<=$numPages; page++)); do
newDir="Page $page"
mkdir "$newDir"
cd "$newDir"
pdfimages -f $page -l $page -png -p "../../$pdfFile" "${pdfFile%.*}"
cd ..
if [[ ! $(ls -A "$newDir") ]]; then
rmdir "$newDir"
fi
done
cd ..

This one takes in the name of an image and a mask and merges them:

Quote:


#!/bin/bash
image=$1
if ! [[ -f $image ]]; then
echo "$image" is not a valid image filename
exit 1
fi

mask=$2
if ! [[ -f $mask ]]; then
echo "$mask" is not a valid mask filename
exit 1
fi

convert "$image" "$mask" -compose CopyOpacity -composite "${image%.*}"-merge.png


My original post isn't showing up. In case it got eaten because of the included bash scripts, here is an edited version...

After trying to copy Pawn images from the PDF file and getting black backgrounds, I found a few old threads about the issue.

http://paizo.com/threads/rzs2q50z?Black-Background-in-Copy-and-Pasted-Image s
http://paizo.com/threads/rzs2lvmp?Help-with-copying-images-out-of-PDFs
http://paizo.com/threads/rzs2nff1?Black-background-problem

While a couple of the suggestions did sort of work, they weren't perfect and I couldn't find a satisfying explanation as to why. It seemed more like some arcane ritual than a technological operation. I did some more digging, and I thought I'd post what I found in case someone finds it helpful. Maybe this is all well-known, but it didn't pop right out in my search.

First, I found the following post that states that images in a PDF file do not have an alpha channel. Instead, a second image is used as a mask. The linked specification document was too dense for me, but this would make sense. If some programs copy the image without merging the mask, they would lose the alpha channel. This would also explain why extraction with pdfimages outputs two files for an image.

https://groups.google.com/forum/?_escaped_fragment_=topic/pdfnet-sdk/H8n_aN JLtNc#!topic/pdfnet-sdk/H8n_aNJLtNc

Helpfully, someone posted a script for extracting and appying an alpha mask image.

https://gist.github.com/innermond/d110d6234123bf87cc04

The script makes some assumptions that won't work with Paizo's PDF files, but the key commands I adopted are:

Quote:


pdfimages -png -p (pdf_file) (base_name)
convert (image_file) (mask_file) -compose CopyOpacity -composite (new_image_name)

The result is an image with a proper transparent background. I'm running these under Linux, but any program that extracts the image and mask can take the place of pdfimages. Convert is a command line tool from imagemagick, but I imagine you could do the same thing manually using Photoshop or Gimp. Obviously I can't post any examples of the resulting image, but hopefully this is explicit enough for people to follow.

Dataphiles

Pathfinder Starfinder Adventure Path, Starfinder Maps, Starfinder Roleplaying Game, Starfinder Society Subscriber

I've solved this on Macintosh with NO special software (except Acrobat)

1. Open the PDF
2. Click on the image you want to copy
3. "Right-Click" and select Copy Image
4. Open preview
5. Select "File | new From Clipboard"

This will paste/create the image with Transparent background into the picture.

6. Click the "Show Markup Tool Button"
7. Select "Shapes" and create a white box that completely covers the image (Yes it will cover up the image you just pasted).
8. Select Edit | Paste

You now should have the image against a white background

9. File | Save...

Boom, you have a png file, with background that you can use.

Alternatively, you can open an image with a landscape in Preview and then paste the copied image on to that. Preview WILL handle it.

Community / Forums / Paizo / Website Feedback / Black Backgrounds in images from PDF files All Messageboards

Want to post a reply? Sign in.
Recent threads in Website Feedback