Paizos stand on automatic scraping of pdf content for private games


Pathfinder Second Edition General Discussion


Hi all, this is potentially not the correct forum, but I could not find one which was fully applicable.

Does anyone know what Paizo's stand is on sharing tools for automatic extraction of content from their pdfs intended for private games? With more and more games going online, there seem to be more people asking for automatic ways to get pawn art for monsters without having to manually copy all images from a pawn pdf. It is e.g. relatively simple to write a generic* scraper for the pawn pdf's which extracts the images and uses their names as written on top of the image for saving, so it is easy to use in an online game. The big question is if sharing code like this is seen as inappropriate by Paizo.

Does anyone know how Paizo has previously ruled on automatic extraction of content from their pdfs?

* With generic I mean a scraper that is not coded in any way towards Paizos pdfs specifically and can be used in a variety of different pdfs.


Pathfinder Adventure, Adventure Path, Lost Omens, PF Special Edition, Starfinder Adventure Path Subscriber

Paizo is definitely aware of using image extraction utilities on their PDFs and while the link I gave doesn't explicitly give permission to do it, it's solid evidence that Paizo thinks it's ok to do it for the purpose you posited (uploading the images for a VTT). I don't see anything in the community guidelines that would explicitly forbid sharing such a tool on these forums. However, I flagged this to be moved to Website Feedback so Paizo staff can chime in if they wish.

Liberty's Edge

2 people marked this as a favorite.

Given Paizo is purposely locking their PDFs (which is a deliberate choice with InDesign that some other publishers do and some do not) I'd assume they don't want you going into legally obtained PDFs and pulling out maps and art for monster & PC tokens.

Really, I'm surprised more companies haven't taken advantage of the rush in VTT to release token packs and/ or art packs. Paizo could easily sell versions of their maps optimized for VTTs.

Paizo Employee Designer

6 people marked this as a favorite.
Jester David wrote:

Given Paizo is purposely locking their PDFs (which is a deliberate choice with InDesign that some other publishers do and some do not) I'd assume they don't want you going into legally obtained PDFs and pulling out maps and art for monster & PC tokens.

Really, I'm surprised more companies haven't taken advantage of the rush in VTT to release token packs and/ or art packs. Paizo could easily sell versions of their maps optimized for VTTs.

We do!

Roll20, Fantasy Grounds, PDF to Foundry package

We've been working on making more maps and tokens available through VTT for over a year now. We just don't sell the VTT versions directly since we typically make the digital assets available to the individual VTT companies so they can make sure they're compatible with their systems and sell the best versions for their platform through their storefronts. We're a small company and VTTs aren't all standardized yet, so it's more efficient to provide our licensors with the packages and let them pre-optimize them to the platforms they're the experts on.

Liberty's Edge

1 person marked this as a favorite.
Michael Sayre wrote:
Jester David wrote:

Given Paizo is purposely locking their PDFs (which is a deliberate choice with InDesign that some other publishers do and some do not) I'd assume they don't want you going into legally obtained PDFs and pulling out maps and art for monster & PC tokens.

Really, I'm surprised more companies haven't taken advantage of the rush in VTT to release token packs and/ or art packs. Paizo could easily sell versions of their maps optimized for VTTs.

We do!

Roll20, Fantasy Grounds, PDF to Foundry package

We've been working on making more maps and tokens available through VTT for over a year now. We just don't sell them directly since we typically make the assets available to the individual VTT companies so they can make sure they're compatible with their systems and sell them directly through their storefronts. We're a small company and VTTs aren't all standardized yet, so it's more efficient to provide our licensors with the packages and let them pre-optimize them to the platforms they're the experts on.

Honestly, I was thinking more for homebrewing than running set APs.

The flip-tiles and flip-mats are great for this. (Although it's kinda annoying they're more expensive on Roll20 than getting a PDF on paizo.com.) Being able to buy something like the Interactive Maps would be handy for people who just need a dungeon or a hedge maze.

Or even someone playing, oh, 5th Edition who wants Paizo's great maps or monster art but doesn't need the full Bestiary in ROll20. Or anyone playing in Discord without a map who might just need monster images.

And, of course, I'm thinking ahead eight months to when I can play again, and how handy it is to just display a monster picture on a TV or hold up an iPad with a monster, without the text from the book surrounding it. Or, y'know, print as a handout.


I really think a lot of the needs of the community could be taken care of if some guidelines were developed with regards to what code we can share and develop when it comes to pdf extractions.

To give a very concrete example. Extracting the pawn images with correct names from the pawn pdfs, thereby getting the art people are requesting for running on VTT, is extremely simple with a script due to the design of having the name on top of the image. But the big problem is if we can develop and share this kind of script without breaking the fair use agreement from Paizo. Is it for example enough if a script has a watermark checker as PDF to Foundry package does, or is there a difference between maps and token art?.


Pathfinder Roleplaying Game Superscriber

Pulling the art from AP's and bestiary is easy enough.

The pawn collections while great physically are terribly low res when extracted for vtt.

I would like to say if Paizo offered a Hi-res add-on for a few bucks when buying othet pdfs I would do it but most of my games are physical and I don't have the space or money to build a fancy mixed reality gaming table.

Grand Archive

Pathfinder Pathfinder Accessories Subscriber; Pathfinder Roleplaying Game Superscriber
Waryyn wrote:

I really think a lot of the needs of the community could be taken care of if some guidelines were developed with regards to what code we can share and develop when it comes to pdf extractions.

To give a very concrete example. Extracting the pawn images with correct names from the pawn pdfs, thereby getting the art people are requesting for running on VTT, is extremely simple with a script due to the design of having the name on top of the image. But the big problem is if we can develop and share this kind of script without breaking the fair use agreement from Paizo. Is it for example enough if a script has a watermark checker as PDF to Foundry package does, or is there a difference between maps and token art?.

Well. I have shared a link to a tool that extracts pictures from PDFs a couple of times on these forums, and I was never warned it was bad.

There is also a Paizo designer that posted a link to a Foundry Module that does *exactly* that, extracting all the images and text of a Paizo PDF watermarked and set it up for use in the VTT (and yeah, it makes sure it's a watermarked PDF, so it rejects pirated ones, and if you can do it, it would probably be very appreciated by Paizo if you did too).
So I doubt Paizo would be against making a sharing that kind of tool, as they have yet to react to any of those that currently exist and are shared here in these forums.


Elfteiroh wrote:
Well. I have shared a link to a tool that extracts pictures from PDFs a couple of times on these forums

Incidentally, what are you using? I'm using `pdfimages` and it works, but it's not able to combine an image with its transparent mask, and I need to combine them manually. I have thought of automating it but I figure let's ask before I start writing code.

Grand Archive

Pathfinder Pathfinder Accessories Subscriber; Pathfinder Roleplaying Game Superscriber
Dr A Gon wrote:
Elfteiroh wrote:
Well. I have shared a link to a tool that extracts pictures from PDFs a couple of times on these forums
Incidentally, what are you using? I'm using `pdfimages` and it works, but it's not able to combine an image with its transparent mask, and I need to combine them manually. I have thought of automating it but I figure let's ask before I start writing code.

I'm using this one:

Image Extractor Post on Reddit
It's custom-made by the poster and opens a console window. You need to manually clean up the images as it extracts absolutely everything it finds that is not very small (sometimes it misses some very small actually useful images so I have made a copy of the launcher that doesn't delete small images, but I rarely need to use it, and it's WAY messier to use) so it ends up with a lot of "background textures" and some "masks" that I *think* are rotation masks? But yeah.

villadelfia wrote:

Download link

Put pdf(s) in the folder, run the .bat file, wait.

At a certain point it will ask you to remove files that aren't needed so that you're left with a directory full of images in the order "Image, Mask, Image, Mask, Image, Mask...". Do that and press enter, it will combine them then and quit after that.

The advantage it has over the Foundry module is that it's compatible with any PDF, but it's way messier (the Foundry Module goes around that by having data manually entered for each supported book on what can be extracted and what needs to be thrown away... So each time a PDF is updated, they also need to update the module, and the old version of the PDF becomes incompatible)


Pathfinder Roleplaying Game Superscriber

You can use Nitro Reader (free PDF reader) and it can pull all the images for you. It's as easy as clicking a button. It will pull out a bunch of extras like images that create the borders of the page, but they're easy to delete.


Elfteiroh wrote:
Dr A Gon wrote:
Elfteiroh wrote:
Well. I have shared a link to a tool that extracts pictures from PDFs a couple of times on these forums
Incidentally, what are you using? I'm using `pdfimages` and it works, but it's not able to combine an image with its transparent mask, and I need to combine them manually. I have thought of automating it but I figure let's ask before I start writing code.

I'm using this one:

Image Extractor Post on Reddit
It's custom-made by the poster and opens a console window. You need to manually clean up the images as it extracts absolutely everything it finds that is not very small (sometimes it misses some very small actually useful images so I have made a copy of the launcher that doesn't delete small images, but I rarely need to use it, and it's WAY messier to use) so it ends up with a lot of "background textures" and some "masks" that I *think* are rotation masks? But yeah.

villadelfia wrote:

Download link

Put pdf(s) in the folder, run the .bat file, wait.

At a certain point it will ask you to remove files that aren't needed so that you're left with a directory full of images in the order "Image, Mask, Image, Mask, Image, Mask...". Do that and press enter, it will combine them then and quit after that.

The advantage it has over the Foundry module is that it's compatible with any PDF, but it's way messier (the Foundry Module goes around that by having data manually entered for each supported book on what can be extracted and what needs to be thrown away... So each time a PDF is updated, they also need to update the module, and the old version of the PDF becomes incompatible)

I also use the Image Extractor there and compared to the Foundry Module, I'd say I like the control over what gets added contrasted to the Foundry PDF importer which makes journal entries for every image that you have to clean up manually. Of course, the converse is that you have to upload the images from the Image Extractor to Foundry so maybe its a horse apiece. For Roll20 though, I found the Image Extractor to be a godsend.

Paizo Employee Director of Brand Strategy

7 people marked this as a favorite.
Elfteiroh wrote:

Well. I have shared a link to a tool that extracts pictures from PDFs a couple of times on these forums, and I was never warned it was bad.

There is also a Paizo designer that posted a link to a Foundry Module that does *exactly* that, extracting all the images and text of a Paizo PDF watermarked and set it up for use in the VTT (and yeah, it makes sure it's a watermarked PDF, so it rejects pirated ones, and if you can do it, it would probably be very appreciated by Paizo if you did too).
So I doubt Paizo would be against making a sharing that kind of tool, as they have yet to react to any of those that currently exist and are shared here in these forums.

We are aware that such tools exist, and it doesn't make a lot of sense to pretend they don't or forbid people from talking about them, since a simple web search or visit to Reddit or other social media sites would reveal their existence in short order. That said, we'd prefer people buy content specifically designed and formatted for their VTT of choice when it exists, as producing that content costs us and our partners time and money to create. But we also recognize that we have produced more content over the last decade+ than we have available on Roll20, Fantasy Grounds, Foundry, etc.

In the absence of such natively available content, as long as users aren't sharing the assets and have come by them legally (by purchasing them from us, as we're the only ones who sell or distribute our PDFs) then what they do with those legally purchased assets is up to them. Even when we have an AP or map product available on a VTT marketplace, customers who own the PDFs of that same content can manually import them into the VTTs themselves. How they extract the images from their PDF is really up to them.

We've taken steps to make it easier to use our Pathfinder and Starfinder Flip-Mat and Flip-Tiles products in VTTs by providing "pre-extracted" JPGs of the content as part of customers' digital downloads. But for someone who doesn't want to have to manually add them to Roll20 or Fantasy Grounds or whatever, there are also options to purchase pre-loaded versions from those partners.

Now, if someone is using an image scraper to pull the art from one of their PDFs and putting it up on a searchable, open platform like Pinterest or whatever, that's a different issue. But for personal use, you can use whatever tools you have at your disposal to get the most use of your purchased content.


1 person marked this as a favorite.

An excellent response, Mark. Thank you so much for your time.

If I may suggest, having a version of the Bestiary Pawn Collections with VTT-friendly tokens/pogs might help that process for many VTT storefronts. I think having those for Pathfinder 2e and Starfinder would do wonders for VTTs, especially considering how some VTTs reuse tokens from completely different monsters to do the job.

Just something to throw out there. I know I wouldn't mind paying for those types of assets. It'd make my game sessions WAY easier.


nephandys wrote:
You can use Nitro Reader (free PDF reader) and it can pull all the images for you. It's as easy as clicking a button. It will pull out a bunch of extras like images that create the borders of the page, but they're easy to delete.

Does not extract the image with transpaencies. Tool only does half the job. (It gets it with the background making it unsuitable for oveylay).


1 person marked this as a favorite.

@Mark Moreland, @Michael Sayre

Hey thanks. This seems like a very reasonable policy and a much better situation than some other games.


Pathfinder Roleplaying Game Superscriber
Dr A Gon wrote:
nephandys wrote:
You can use Nitro Reader (free PDF reader) and it can pull all the images for you. It's as easy as clicking a button. It will pull out a bunch of extras like images that create the borders of the page, but they're easy to delete.
Does not extract the image with transpaencies. Tool only does half the job. (It gets it with the background making it unsuitable for oveylay).

That's all I ever need so it works well for me.


Thank you both Mark M and Michael S. And I greatly appreciate the effort across the board that has been put into making current/future releases VTT friendly. The pre-extracted images have been a huge help.

-----
RE: image extraction

I also use the same tool as Elfteiroh - though I use it for bulk stuff - like needing all the images from a certain PDF. It does pull out a lot and you'll have to delete a bunch of the images/masks (sorting by size helps get a bunch of the "same" images and then sort by name to clean-up any remaining). Overall this is still a quicker process than an alternative for me.

The other option I use is Token Tool. I use this for more single-use cases. With Token Tool you can load in a PDF and then you can either make tokens directly from that or you can drag images out onto your desktop/folder/wherever.

Both of these tools will give you transparent backgrounds which is perfect for making your own tokens.

---

Re: VTT-ready Tokens

I would love for VTT-ready tokens being sold. Ideally when based off of the Bestiary Battle Cards since those have individual monster art for each creature. They are great. And in the interim I highly recommend their digital downloads. Though, Bestiary 1 is just individual JPGs, and Bestiary 2/NPC battle cards have JPG and PDF. a PDF for Bestiary 1 would be great to get transparent backgrounds.


1 person marked this as a favorite.
Pathfinder Lost Omens, Rulebook Subscriber
Michael Sayre wrote:
Jester David wrote:

Given Paizo is purposely locking their PDFs (which is a deliberate choice with InDesign that some other publishers do and some do not) I'd assume they don't want you going into legally obtained PDFs and pulling out maps and art for monster & PC tokens.

Really, I'm surprised more companies haven't taken advantage of the rush in VTT to release token packs and/ or art packs. Paizo could easily sell versions of their maps optimized for VTTs.

We do!

Roll20, Fantasy Grounds, PDF to Foundry package

We've been working on making more maps and tokens available through VTT for over a year now. We just don't sell them directly since we typically make the assets available to the individual VTT companies so they can make sure they're compatible with their systems and sell them directly through their storefronts. We're a small company and VTTs aren't all standardized yet, so it's more efficient to provide our licensors with the packages and let them pre-optimize them to the platforms they're the experts on.

What Roll20/Fanatasy Grounds have and what Foundry VTT has are completely different. In the former cases, the adventure/system assets (in full quality) are given to the virtual tabletop creators and they are reformatted and remixed and then sold and redistributed in the virtual tabletop marketplace, likely with Paizo getting a cut (my guess). In FoundryVTT's case, the PDF importer is a fan-made module and uses the publicly purchasable PDFs that the creator buys with no support from Paizo itself, and the quality of the output of the module is limited by what's in the PDF without making significant changes that would require redistributing portions of the adventure that are not under the OGL/CUP and therefore violating copyright. The user of that system needs to buy the PDF and use the software that was created to import into Foundry VTT.

Grand Lodge

Its an interesting position to take given that the org play community is constantly sharing (and encouraged to share) prepared tables of content in Roll20, Foundry, etc without any indication the recipient owns the content--content that includes custom/published maps and images/pitches that only exist in the published material.


TwilightKnight wrote:
Its an interesting position to take given that the org play community is constantly sharing (and encouraged to share) prepared tables of content in Roll20, Foundry, etc without any indication the recipient owns the content--content that includes custom/published maps and images/pitches that only exist in the published material.

Fantasy grounds takes copyright very seriously - OGL stuff is fair game with the appropriate license text - but sharing copyright stuff is a no go on the forums - no one pretends piracy doesn't exist - but I can tell you from experience the only reason you'd create your own module is because it doesn't exist officially yet - getting the text into a VTT is... a chore. Images can be copied literally with the windows cut and paste tool depending on what you need, and if you are doing it for personal use at a single table - and you take the time to do it for everything that exists.... well good on ya because that's thousands of man hours - even with image exports.


But would how easy the image extraction is, change if it is still within fair use? So far most of the examples given are rather time-consuming and cumbersome.

We can take as an example the bestiaries. Using CUP we are allowed to use the names of creatures for community use. It is therefore possible to create a simple mapping of (page number, image number) -> creature name, and make an automated script that takes these images out and gives them the correct name. This script could easily be shared, such that everyone who owns the Bestiaries easily could create a folder of named images. Would a script like these, which actually have the names included, be ok? And would such a script require watermark checking to be within fair use, or is that not an actual requirement?

If that is within fair use I would be happy to create such a script, I have most of the code, and the mapping can be done almost automatically using image similarity search.


If you want art for bestiary creatures it's in Nethys. The images in the books are higher resolution but they're often cropped to fit.

The easiest way to script getting art for monster manual books is to scrape google image search or use the bing api. I've done that myself. Those engines' algorithms are good enough that almost always grabbing the first 10 will get you what you were after. (EDIT: And of course, in publishing the script you aren't distributing anyone else's content).


Dr A Gon wrote:

If you want art for bestiary creatures it's in Nethys. The images in the books are higher resolution but they're often cropped to fit.

The easiest way to script getting art for monster manual books is to scrape google image search or use the bing api. I've done that myself. Those engines' algorithms are good enough that almost always grabbing the first 10 will get you what you were after. (EDIT: And of course, in publishing the script you aren't distributing anyone else's content).

I agree that this is by far the easiest solution, but I think normalizing this kind of script where users can get the art for free is dangerous, as it will be hard to transition users back to paying for art if they just get it for free. Paizo is extremely nice to allow AoN to host the art, but if everyone just starts scraping it for art, that may be reconsidered as it limits their ability to earn money on it. For this reason, I think it would be best for solutions for getting images on VTT should be based on PDFs, as these are at least bought from Paizo.


I think people are going to use search engines anyway to get extra arts, so it's already "mormalized behavior". As long as they're buying the PDF or book I don't see how it matters.

Then again, it's probably like freemium games - a lot play for free to make up the community numbers, and a few buy everything to keep the company profitable. I'd be interested to see sales figures for that, but that won't happen.

I have every rulebook (not setting/Golarion) as a GM and some of my players don't even have CRB (but most do).

EDIT: I don't know if this is normal or not but I don't usually buy scenarios or APs. Usually the players buy it and lend it to me to run it. I would have thought GMs were the ones buying adventures, we're probably weird! I'm currently GMing Abomination Vaults; my Tuesday group wanted to play it so I said if they bought it, I'd GM it, and that happened.


I don’t really know either, of course - but yes that sounds weird to me. :)

I think the players pitching in to buy the DM the adventure is how it should be but it doesn’t seem to be the norm.


I buy the PDF of the adventures and rulebooks. I use images from those in online games I run. When I need maps, tokens, art to show the players, etc, I think that counts as fair use.

Community / Forums / Pathfinder / Pathfinder Second Edition / General Discussion / Paizos stand on automatic scraping of pdf content for private games All Messageboards

Want to post a reply? Sign in.