
Boomerang Nebula |
1 person marked this as a favorite. |

I was thinking about the Pathfinder 2.0 play test and I was wondering how many participants Paizo would need in order to get feedback that is representative of the entire population of Pathfinder players. It turns out that the number is surprisingly small if my base assumptions are correct. These are my assumptions:
* There are 1,000,000 (1 million) or less Pathfinder players.
* The people within the play test are a random selection of the main population.
Then if I select the following parameters:
* A confidence level of 99%. Which means your answer is wrong 1% of the time.
* A confidence interval (margin of error) of 5%. This means that the true answer will fall within plus or minus 5% of the value obtained 99% of the time (since that is the confidence level selected).
Based on my assumptions and the criteria selected, it turns out that the minimum number of play testers required is: 665 people.
To put this into context with an example. If Paizo were to ask 665 play testers the question: should goblins be a core race? And the response was 80% "yes". Then they could be 99% sure that if they were to ask all 1 million Pathfinder players the same question the "yes" response would be in the range of 75% - 85%.
The parameters I have chosen may not be accurate, this website has a calculator which lets you vary the parameters.
https://www.surveysystem.com/sscalc.htm

Fuzzypaws |

The people within the play test are NOT a random selection of the main population. They are those who are most invested.
That said, yes, you don't actually need a ton of people to accurately represent a large population overall. And there will certainly be more than 665 testers, between the people active on these boards, the groups they game with, and all the silent folk who don't actually post on forums but will still take part in the playtest. (Heck, I've been silent for years and years but still actively GMed and bought products all this time.)

Boomerang Nebula |
1 person marked this as a favorite. |

There are a couple of problems with your assertions and even your example question is a strange choice, because most answers to that are probably independent from the playtest.
The only real problem is whether the group of play testers chosen is sufficiently random. As Fuzzypaws pointed out, participants of the play test are likely to be the most engaged people in the hobby so whatever question you pose needs to be independent of the level of engagement to obtain the most accurate results.

Mathmuse |

The first step of a statistical analysis or data science project is to figure out what kind of analysis will satisfy the customer's needs. Boomerang Nebula started calculating the necessary sample size before asking the question, "What is being sampled?"
The playtest is not a sampling of Pathfinder players. The playtest is a test of a new gaming system. The game, not the players, has to be sampled. Hence, the total number of Pathfinder 1st Edition players is irrelevant.
The playtest ought to identify the places where the design of Pathfinder 2nd Edition does not match the design goals of Pathfinder 2nd Edition. See the Extra Credits video on Break Points for an example of how such disparities occur. Therefore, what the playtest is sampling is encounters in the game. A party typically has ten encounters before leveling up, and if we want to test all levels, the entire sample for one character would be 200 encounters. And then we want to test the variety of the characters, too. Do human paladins work as we expect? What about goblin paladins, then? Okay, we don't have expectations for goblin paladins, since that sounds almost like a contradiction, but will they be characters worth playing in PF2? Eight races, er I mean ancestries, and twelve classes makes 96 ancestry-class combinations. 200 times 96 equals 19,200, but that ignores that each encounter involves four characters. Let's not get into the possible combinations of characters in a party. Instead, we make up reasonable parties with the roles filled, so we could look at 19,200/4 = 4,800 encounters. However, statistical sampling assumes independence between samples. The results for a CR6 encounter with a particular group of four 5th-level characters are strongly correlated to the results for a CR5 encounter with the same group of characters at 4th level. Due to those correlations, we could make do with a much smaller sample.
Given the enthusiasm expressed about the PF2 playtest, we will have enough playtesters to cover all those encounters.