The new "whiteboarding": why DPR as a metric is broken.

 1 to 50 of 153 << first < prev | 1 | 2 | 3 | 4 | next > last >>

 4 people marked this as a favorite.

EDIT: Note that I'm good at overstating things. DPR isn't a useless metric, it's just not a useful metric in a vacuum anymore, see rest of post...

PF2 got me to thinking quite a bit about our assumptions about how PF2 "doesn't work" for "theorycrafting" and "whiteboarding". There have been previous comments of a similar vein and I want to state that flat-out, these are wrong. There are a few things coming into play here:
1. Because the numbers are much tighter in 2e, variance plays a much bigger role.
2. Because of the way the action economy works, looking at "full attack every round" for computing efficacy is no longer the way to go about it (I'm looking at you Impossible Flurry!).
3. Because you can no longer one-round shot an enemy, and mobility has improved, DPR is no longer a useful metric by itself. Defenses *really do* matter.

So there's not much to do about the first item on the list here. Previously you could pretty much guarantee a hit on your first attack via attack bonuses and whatnot. Now, the best you can do against a decent opponent is probably closer to 70-75% (buffs/flanking and legendary proficiency). But the rest, well, the rest is manageable, and I think it's worth looking at new approaches to discuss the efficacy of builds. This is still mathfinder, but I think our math just got a lot harder :-P.

Honestly I don't know what makes the most sense, but some thoughts:
1. Look at combinations of 1/2/3 action sets (some of this is happening).
2. Look for action economy "wins" (things like flurry of blows, sudden charge).
3. In evaulating a character numerically, consider standard deviation of AC, and average damage from the "average". Maybe include saving throws in these numbers...

 8 people marked this as a favorite.
Pathfinder Rulebook, Starfinder Roleplaying Game Subscriber

I don't know, DPR has always been a flawed metric. Even in PF1 real world play varied significantly, and DPR was just a way to describe potential damage over a span to compare options.

In PF2, it could be Damage Per Action without changing much about how the math is used to feel out how different damage options compare in raw throughput.

What I think is going to be harder, and you touch on, is math to optimize defensive options. If you can say this action has X DPA and this defensive option has Y effective damage mitigation, you might be able to start to build a model for expected optimal action combinations. Obviously that's going to be nearly impossible.

@WatersLethe:

Agreed, it's always been a bit flawed, but I think in PF1 it was a good model because an archer could commonly drop an enemy in a single volley. I think we fell into the habit of measuring DPR for the non-caster types as a way to judge their efficacy. Honestly, it's a habit I had to get out of as well when I started playing 5e, where it's probably more flawed of a metric than in PF1, but less than PF2.

I'd agree, defensive options are hard to measure, but at the very least, saying 'my character does X damage per action!' doesn't cut it anymore to prove that you have a good build, at least to me. You need to argue something about your character's ability to hold up to attacks, either through AC/saves or through Damage Reduction.

I'm sure we'll never have a way to say "character X is better than character Y", but at least being able to talk about relevant numbers would be good. Particularly with the "expected DC" tables, saying how your AC/saves compare to those numbers seems particularly useful.

 7 people marked this as a favorite.

I agree that DPR was never a perfect measuring stick, but I think it's a bit of a stretch to call it 'broken' either. It's still an absolutely useful way to compare characters of similar functionality. If you're comparing and contrasting Fighters and Rangers, whose primary combat roles are both hitting things really hard, it seems silly to not look at how much damage they can put out in various scenarios.

The biggest change is more that you no longer have one baseline, because full attacks aren't the be all, end all of martial combat anymore. That doesn't make damage comparisons useless though, it just means you have more variables to look at.

 3 people marked this as a favorite.

Well I think the biggest issue is that this is a cooperative game. As soon as players start learning how to synergize, their combat effectiveness shoots up. Maybe the fighters defense vs magic is his friend the highly mobile grappling Monk.

Maybe there's a caster without a single primarily damaging spell. He can't kill anything, but he can earthbind flyers so slow melee can pound them or grease the floor causing chaos to the guard patrol chasing the party.

A rogue might not be able to take out a large group of enemies but maybe he helps the party sneak past them.

I remember reading during the play test of one group with a druid who would use stoneshape(or something like that) to make staircases raising enemies high so the Monk could zoom up the stairs, grapple, and then fling the enemy off the stairs. No idea how you would calculate that but it sounds fun, awesome, and funny.

A lot of being an effective character, to me at least, is figuring out how to be awesome with the party. A damage dealing power house can be amazing but it's only one part of a whole.

Hasn't there always been the caveat with "DPR" as a metric about things like "if you start adjacent" or "if you can sneak attack or not"? Which is to say, the numbers have never been all that reliable. Even in PF1 when we're talking about "archery" (far and away the most reliable combat style) there's a question of how many rounds are you going to spend buffing yourself that determines who is the best at archery.

PossibleCabbage wrote:
Hasn't there always been the caveat with "DPR" as a metric about things like "if you start adjacent" or "if you can sneak attack or not"? Which is to say, the numbers have never been all that reliable. Even in PF1 when we're talking about "archery" (far and away the most reliable combat style) there's a question of how many rounds are you going to spend buffing yourself that determines who is the best at archery.

So yeah, it's always been a rough metric. I just think it's a significantly worse rough metric in PF2 especially as compared to PF1. I think it works reasonably well as a sole metric for PF1 for character evaluation (not perfect, flawed, but good). I don't think it works at all for PF2 unless you consider a variety of other factors.

 1 person marked this as a favorite.

I think the way to analyze PF2 characters is to break down "what is the best use of all of your 3 actions in a bunch of different situations." It's just that you can't really reduce that to a single number.

 2 people marked this as a favorite.

The DPR calcs in this forum are already very "thorough", they are done vs a buncha different enemy ACs, with/without a buff, spending X amount of actions, etc. Its a lot of info but it is being computed.

Are those conclusions not being felt in practice for you? For example, Fighter has the best DPR in basically every situation regardless of build. Are you seeing other martials do better than them?

Another point I remember is how cantrips generally do better than a weapon if you're a Sorc/Wiz, no matter how much you specialize in the weapon. Is this also not working as shown?

I think the conclusions drawn from the DPR threads here have been quite spot-on and in many cases have helped discover some of the more interesting builds.

So gonna need to see some evidence that they reach incorrect conclusions. However, if you say that maybe "Not all of the relevant information is being looked at", then you're probably right! There's probably some stuff not yet discovered because aren't looking at the right data.

 1 person marked this as a favorite.

1. Because the numbers are much tighter in 2e, variance plays a much bigger role.

This has always been true for every edition of this game - because it is founded on using few uniform die, it does not use dice pools like wargaming that will give you a more gaussian result where average has meaning.

The tightness of the game just makes it even more true.

I challenge anyone doing these DPR simulations, you need unrealistic number of simulations (tens and hundreds of thousands) to get the average to settle into what fractional odds tell you it should be.

This is the very nature of averaging random trials, is that averaging only settles in after very very very large number of trials.

Instead take that and do say 200 rolls which is realistic amount for a level, then simulate many players doing that level and report on the sigma or quartile variances.

I suspect player luck will dominate results more than DPR, and show these narrow spreads between build choices are meaningless.

 1 person marked this as a favorite.

Player luck is not a factor in DPR calculations because "player luck" is a combination of multiple independent factors and biases. Using IRL rolls (assuming perfect/consistent environmental conditions and rolling technique) would only show how biased the dice are compared to the calculated DPR.

* Also a reminder that max/min DPR is not affected by rolls as they use effectively static values. But those are more meant to see what the best/worst round are.

* * Note: I'm not saying the dice aren't important or that DPR is the end all be all.

 2 people marked this as a favorite.

If you look at my expected damage charts they have 1, 2, and 3 action expected damage. It would also be cool to calculate on average what +- 1 Ac/attack is worth

I think those are useful numbers to have

 1 person marked this as a favorite.
krazmuze wrote:
I suspect player luck will dominate results more than DPR, and show these narrow spreads between build choices are meaningless.

I don't think anyone's argued that die rolls don't play a significant part of how these numbers shake out, but I also think it's a little absurd to insinuate that numbers don't matter because someone might get lucky.

 2 people marked this as a favorite.

Then do the level simulation and prove me wrong. Every simulation I have seen says they calculated the fractional odds - which is only true for infinite simulation, or I simulated 50000 runs to get a precise average - which is not the reality of any players level.

The fact is that IF the variance is greater than the differences in average, then the build is not more important than the dice.

This is statistics 101

3.7+/-1 and 3.6+/-1.2

you cannot conclude that A is better than B, you instead must conclude that they are not significantly different because the range of averages have significant overlap.

3.7+/-0.1 and 3.4+/-0.15

You absolutely can conclude that A is better than B the averages do not overlap (to whatever confidence you calculated - usually 95% confidence is used) You cannot however conclude by how much as it could be 3.6 vs. 3.55 or it could be 3.8 vs. 3.25

It is this very gamblers fallacy that think the average odds apply to them that makes Vegas rich. The house can play the averages (because they make all the plays) - the player cannot (because they cannot play enough)

 5 people marked this as a favorite.
krazmuze wrote:
then the build is not more important than the dice.

Again, nobody is arguing that variance doesn't play a significant factor in how a game plays out. It's just that variance existing doesn't magically mean numbers no longer matter.

 3 people marked this as a favorite.
Pathfinder Companion, Pathfinder Accessories Subscriber; Pathfinder Roleplaying Game Superscriber
krazmuze wrote:

Then do the level simulation and prove me wrong. Every simulation I have seen says they calculated the fractional odds - which is only true for infinite simulation, or I simulated 50000 runs to get a precise average - which is not the reality of any players level.

The fact is that IF the variance is greater than the differences in average, then the build is not more important than the dice.

This is statistics 101

3.7+/-1 and 3.6+/-1.2

you cannot conclude that A is better than B, you instead must conclude that they are not significantly different because the range of averages have significant overlap.

3.7+/-0.1 and 3.4+/-0.15

You absolutely can conclude that A is better than B the averages do not overlap (to whatever confidence you calculated - usually 95% confidence is used) You cannot however conclude by how much as it could be 3.6 vs. 3.55 or it could be 3.8 vs. 3.25

It is this very gamblers fallacy that think the average odds apply to them that makes Vegas rich. The house can play the averages (because they make all the plays) - the player cannot (because they cannot play enough)

That builds of same-level characters are not so far apart in PF2 that you can decidedly say that one is always better than the other is not a bug. It is a most welcome feature.

 1 person marked this as a favorite.

Sure you mean to say 'you cannot' rather than 'you can'. The entire point that statistics makes is that you cannot say one is better than the other.

Two encounters results just rolled with real die. Flat checks because I do not want to write a fancy simulator. Did not even have to cherry pick to find two runs to prove my point, the first two runs do that.

20 13 4 9 1 3 17 11 17 13 16 20 - AVE 12, two crit success, 1 fumble, 5 > DC15
1 11 5 4 1 14 19 7 14 13 9 12 - AVE 9, no crit success, 2 fumbles, 1 > DC 15

Persist that over a level I doubt it is going to average with 95% confidence that I can say this fighter is better than that ranger. In fact if I took fighter and I was the 2nd run I would be pissed if my supposedly inferior ranger buddy was doing better than me. Gamblers fallacy...that 3 diff in ave, and 4 diff in achieving DC is much greater than most build differences.

I want to see people do simulations that start reporting variations on a per level basis - so that people can conclude when things are not signficantly different - instead of saying this is 0.1 better it is a must always take gold option.

 9 people marked this as a favorite.
krazmuze wrote:
Gamblers fallacy...

Gambler's fallacy refers to the myth of 'maturity of chances'. The idea that if you roll a series of bad rolls your next one is more likely to be a good roll.

That has nothing to do with saying that 10 is larger than 8. That's more like basic arithmetic.

 1 person marked this as a favorite.

What's the point here? DPR analysis is not useless but not useful?

DPR or "theorycrafting" in general gives a pretty good insight into the system. PF2e is not rocket science. You want to play a damage focused character? Look into the damage charts. The end.

I don't get it...in every dpr thread someone is comming and meantions "it has nothing to do with the real game". DPR doesn't mean an overall better character and no one said the opposit. But you can compare builds for 1/2/3 actions, vs. flat-footed, vs. low/high ac and so on.

So, what's a better metric for damage output?

 7 people marked this as a favorite.
krazmuze wrote:
The entire point that statistics makes is that you cannot say one is better than the other.

Of course you can. If you compare 1d20+6 and 1d20+7, whatever you roll, the second is always better.

With the same dice rolls, a better build will outperform a worse one. If you're dealing 20% more damage, you deal 20% more damage, whatever you roll. You may have a variance, but you only need a few dozen rolls to clearly see the difference.

 1 person marked this as a favorite.
puksone wrote:

What's the point here? DPR analysis is not useless but not useful?

DPR or "theorycrafting" in general gives a pretty good insight into the system. PF2e is not rocket science. You want to play a damage focused character? Look into the damage charts. The end. ...

The point is that your conclusion is 92% wrong. You want to play a damage focused character, you need to worry about survivability as well as your ability to put out damage, or else everyone's going to be outdamaging you when your corpse is unable to act.

I think I stated pretty clearly what the point was in the original post. DPR isn't useless, but it can't be considered in a vacuum like you could get away with in PF1.

SuperBidi wrote:
krazmuze wrote:
The entire point that statistics makes is that you cannot say one is better than the other.

Of course you can. If you compare 1d20+6 and 1d20+7, whatever you roll, the second is always better.

With the same dice rolls, a better build will outperform a worse one. If you're dealing 20% more damage, you deal 20% more damage, whatever you roll. You may have a variance, but you only need a few dozen rolls to clearly see the difference.

I'd generally agree with you, but it's worth considering variance in all this. Even if one is better than the other, if that gets lost in the noise unless you roll the dice 5,000 times, then maybe it effectively "doesn't matter", as you're unlikely to make that many dice rolls over a character's lifetime.

As an addition, one thing that affects things that we haven't discussed is critical specializations. Even just looking at straight DPR over 3 actions, a fighter is going to have a better chance to critical, which for certain weapons can give them (and others) a better chance to hit on later attacks.

Don't get me wrong, I like the fact we're coming up with numbers for the game, and I'm actually glad this has garnered as much attention as it has, as my point wasn't to poo-poo number gathering, just to say that while DPR is a decent place to start, it's no longer a decent place to finish like it was in PF1. Even for a class that wants to deal straight-up damage, you need to consider a lot of other factors, which I think makes the game a lot more interesting. We just all need to do a lot more thinking about good ways to analyze builds.

I say that I love the 3 actions system.

I happened to see flaws thinking about a specific character, but also i managed to solve part of them in a party scenario.

Thinking about a single character we could talk about his defense and attack.

At The moment of the character creation, you are already set.

Want to be legendary in attack?
I am sorry, you are not a fighter.

Want to be legendary in defense?
I am sorry, you are not a champion or a monk.

Same goes with saving throws.

This could be strict for a single character, and being forced to play a specific main class because you want to be a too dps Or a frontline tank it is, let us be honest, not the best we could have aimed for.

However, we can rely on our party, and slightly pursue some ways which could enhance our priorities.

Canni acumen will Transform, unfortunately only at lvl 17, an expert saving throw into a master one.

Eventually, classes like rogue will give you the possibility to hot master by lvl 12 instead.

You could combini both of them to get 3 master st, or 2 master and 1 legendary, Depends your main class.

As for the armor class, I am not sure what you could do yourself ( increasing AC,not dmg mitigation. These are 2 different things ) apart from getting a full plate for a +1

About the attack ration, heroism is a Nice boost with a decent duration.

Remember that pure spellcaster could buff you to be more powerful, and a bard is definitely great with compositions.

You could also work toward some saving throws immunities ( critical success with any roll but 1 ) the more you proceed, while for others you would achieve a success with a 4 or more, and a critical with 14 or more.

Think of these numbers as example to point out the fact that you can work with TS, through racial traits, class feats, General feats, spells and equipment.

Armor is different, and you will have to rely on your party.

- compositions
- Lay on hand
- spells
- positioning
- reactions
- talents

But even so it will be harder to work with AC numbers.

This at least currently.
Remember that classes like barbarian and champion have class perks which mention legendary, while still they can't.

Maybe an errata, maybe more possibilities in the future.

 1 person marked this as a favorite.

I'd generally agree with you, but it's worth considering variance in all this. Even if one is better than the other, if that gets lost in the noise unless you roll the dice 5,000 times, then maybe it effectively "doesn't matter", as you're unlikely to make that many dice rolls over a character's lifetime.

You don't need 5000 rolls. A few dozens are far enough unless you make an extremely lucky serie. 5000 rolls are only useful if you need .1 precision.

There are of course other things to take into consideration: the conditions to use your DPR and the amount of resources (hp being one) that you need to use to apply it. This is why most of the DPR calculations consider fairly common situations and low resource use.

 1 person marked this as a favorite.

Re: Statistics
Variability is a large factor, not just average.
3d6 & 1d20 both average to 10.5, so would look the same on paper and in most calculations on these forums.
Yet if fighting minions, the certainty of 3d6 would triumph. If fighting a tougher enemy, you might want more variety if the middle of the bell curve doesn't favor you.

I expect to see this more with the Swashbuckler according to dev comments. It's sort of how Babe Ruth would always swing for home runs. That gives him an excellent home run average, as well as RBI, etc., while also making him a leader in strikeouts. Do we need a grand slam or do we simply need to get somebody on base? Maybe get extra bases through stealing instead? Could Babe land a bunt that's certain to drive in that winning run or can he only attempt do-or-die batting?
We see this a bit already w/ Picks & crit-fishing. Bad average damage w/o a crit, w/ crit spikes making picks look better "on average" than they perhaps are (unless you fight a lot of high hit point minions I guess). (Also note some creatures are immune to crits, so that makes a weakness for pick users.)

Also, when dealing w/ variables, it's important to distinguish between known variance, like when working with balanced dice, and unknown variance like when dealing w/ polls or research. The first shows how much fluctuation there is from the known quantity (typically the average in these calculations) while the second is describing the certainty re: that semi-known quantity, as in that average may flat out be wrong by +/- X.

SuperBidi wrote:

I'd generally agree with you, but it's worth considering variance in all this. Even if one is better than the other, if that gets lost in the noise unless you roll the dice 5,000 times, then maybe it effectively "doesn't matter", as you're unlikely to make that many dice rolls over a character's lifetime.

You don't need 5000 rolls. A few dozens are far enough unless you make an extremely lucky serie. 5000 rolls are only useful if you need .1 precision.

There are of course other things to take into consideration: the conditions to use your DPR and the amount of resources (hp being one) that you need to use to apply it. This is why most of the DPR calculations consider fairly common situations and low resource use.

I haven't run the numbers here, so I'll believe you here. Though I don't know what you mean by ".1 precision" in terms of this metric. Can you elaborate?

One thing I think does help is that, in general, damage is lower and HP is higher than PF1. It leads to less "wasted" damage for these calculations than previously. A character who does 300 damage is not any better than one who does 200 if each enemy only has 100 hp.

 1 person marked this as a favorite.
Squiggit wrote:
krazmuze wrote:
Gamblers fallacy...

Gambler's fallacy refers to the myth of 'maturity of chances'. The idea that if you roll a series of bad rolls your next one is more likely to be a good roll.

That has nothing to do with saying that 10 is larger than 8. That's more like basic arithmetic.

Not if 10 and 8 is your DPR, then they are averages calculated from the fractional odds - so statistics absolutely apply here. While you calculate odds using basic arithmetic, that represents the result over an infinity of rolls. If you have finite number of rolls then you have to simulate it and report the variances.

Vegas makes money off the gamblers fallacy because the gambler believes short term results should achieve the average, so if they are under the odds they will shortly get back over the odds. But only Vegas itself can acheive the odds their qty of rolls is much much much higher over many many gamblers, the individual gambler will always be subject to the variance of shorter runs.

Which is why you need to simulate realistic numbers of rolls to find out if 8 is indeed greater than 10, which can be statistically true if the numbers are properly reported with their variance such as 8+/-6 vs 10+/-2. Now while I do not think the variances are that bad in a level - I do think that dozens of encounters with dozens of rolls each that represents a level is not going to achieve 0.1 variance that makes the claim this option is better than that option true.

SuperBidi wrote:
krazmuze wrote:
The entire point that statistics makes is that you cannot say one is better than the other.

Of course you can. If you compare 1d20+6 and 1d20+7, whatever you roll, the second is always better.

Nope - again you are falling prey to gamblers fallacy. What you said is only true if the 1d20 achieves the same average for different players over the level that the modifier applies but statistics tells you that is unlikely to happen because the number of rolls is too low to achieve such a precise average. If the variance of the 1d20 is actually greater than 1, then it is actually likely the second player with the +7 performs worse.

You need to have a modifier that exceeds the variance in order to say one is a clearly better option. Say for the barbarian with the +11 bonus to greataxe attacks is going to do better than the wizard with the +1 bonus to dagger attacks. It is unlikely that the variance on the die over a level is 10.

If you did not make the DC because you are on a cold streak - which is entirely possible to happen for the dozen of rolls per dozen of encounters that make a level - then it is not likely to matter that you had a +1 even if you constrain the discussion to only talking about the same rolls - rather than comparing options for different players with different rolls.

 2 people marked this as a favorite.

I haven't run the numbers here, so I'll believe you here. Though I don't know what you mean by ".1 precision" in terms of this metric. Can you elaborate?

Usually what is used to bound an average estimate is called the '95% CI or Confidence Interval', what that means is that someone else repeats your simulation with the same number of rolls that you are 95% confident that their results will be within the +/- bounds that you reported for your average. This is necessary because fractional odds are only precise for an infinite number of rolls, any finite number of rolls must be reported with +/- bounds.

So if it takes 5000 rolls to achieve a 95% CI of +/-0.1? That precision is meaningless number when rolling dozens of encounters that make up a level, it simply is not possible to achieve.

a QA/MFG engineer would be out of work if they told the boss they sampled a dozen widgets out of the production run of a million and it conforms to the expected norms. They have to use the statistics and determine how many units they actually need to achieve the confidence interval bounds they are comfortable with. Hopefully their process is gaussian which takes fewer numbers than a uniform process to achieve high confidence in their numbers.

I am not saying stop with the DPR - I am saying qualify it with precision bounds using variances across realistic simulations.

 3 people marked this as a favorite.

Krazmuze, I'm sorry but you really don't know what you are talking about… I know you have a point about the variance and I know what you're trying to say is really something, but the way you are saying it is completely incorrect…

 1 person marked this as a favorite.

Fine - prove me wrong.

Make a spreadsheet that runs a dozen encounters with a dozen attacks to simulate a level thousands and thousands of times. Report back to me when you have achieved the fractional odds with high enough precision that you trust you can report on the fractional differences between builds.

I will not wait up - because I already know that the people that have done this had to simulate 50,000 rolls to get confidence in their averages that could backup their build differences they was simulating. And even then people tell them they did it wrong because their numbers are slightly off from what they calculated using fractional odds. fractional odds only represent the result of infinite rolls, finite rolls suffer statistical precision.

My feel for the numbers is that over the level you certainly can claim the barbarian is better than the wizard at melee. But this fighter option vs. that ranger option? Not buying it until someone shows me the realistic level simulation with the confidence intervals.

SuperBidi wrote:

You don't need 5000 rolls. A few dozens are far enough unless you make an extremely lucky serie. 5000 rolls are only useful if you need .1 precision.

Did you not see my example above using a couple dozen real rolls? The first two dozen I ran had significant variations that was of much greater magnitude than most charop discussion of this option is a fraction better than that option. Again it comes back to gambler fallacy that people think that the odds balance themselves with magnitude shorter runs than it actually takes.

krazmuze wrote:

So if it takes 5000 rolls to achieve a 95% CI of +/-0.1? That precision is meaningless number when rolling dozens of encounters that make up a level, it simply is not possible to achieve.

If you roll 100d20, you have more than 90% chances to have an average between 9.5 and 11.5, so less than 1 point of difference to the average. And we are speaking of a 1 point difference, which would be translated as a 10% difference in DPR, which is around the minimum DPR difference that would be taken into account by a normal player. And that's the most extreme case where the variance is maximized as I've just rolled a die without static bonuses.

So, no, you don't need 5000 rolls. During a single game, you should be able to feel a 20% difference in DPR.

So DPR calculation is far from invalid. It's very valid. Even if it's not the only thing to take into account when building a character, of course.

 4 people marked this as a favorite.
krazmuze wrote:

Then do the level simulation and prove me wrong. Every simulation I have seen says they calculated the fractional odds - which is only true for infinite simulation, or I simulated 50000 runs to get a precise average - which is not the reality of any players level.

The fact is that IF the variance is greater than the differences in average, then the build is not more important than the dice.

This is statistics 101

3.7+/-1 and 3.6+/-1.2

you cannot conclude that A is better than B, you instead must conclude that they are not significantly different because the range of averages have significant overlap.

3.7+/-0.1 and 3.4+/-0.15

You absolutely can conclude that A is better than B the averages do not overlap (to whatever confidence you calculated - usually 95% confidence is used) You cannot however conclude by how much as it could be 3.6 vs. 3.55 or it could be 3.8 vs. 3.25

It is this very gamblers fallacy that think the average odds apply to them that makes Vegas rich. The house can play the averages (because they make all the plays) - the player cannot (because they cannot play enough)

Confidence intervals are only relevant when using imperfect measurements and estimations produced from a set of data points. None of us are using that. We are using perfect calculations, derived from the underlying mathematics which defines the system. So yes, we can say that (3.7 +/- 1) is both significantly different from, and greater than (3.6 +/- 1.2) on average, because we have 100% confidence in our calculations.

 1 person marked this as a favorite.

Fine - prove me wrong and simulate a levels worth of rolls using actual rolls and not odds.

It is a random process - using the fractional odds tells you the perfect result for an infinite of rolls.

A players results over a level is a finite result that is a sample of the infinite results needed to achieve that perfect odds. If you want a better estimate than one player - you have to simulate many many players over a level. Confidence intervals absolutely do apply.

I have yet to see anyone report on their +/- variance, despite giving a precise fraction that says this option is better than that option.

 2 people marked this as a favorite.
krazmuze wrote:
Fine - prove me wrong.

I have to say that its refreshing to see someone who actually understands statistics discuss issues with the infinite DRP threads that pop up from time to time. I am finding your critique quite informative. I try to stay out of these threads because the people who do these types of calculations get defensive about their value rather than express a genuine willingness to understand how to improve them or use them responsibly.

I am also curious to hear your assessment of the underlying data used for DPR. My main contention is what I believe are ANOV problems. DPR calculations are based on contrived examples that the author asserts are representative, essentially an inductive approach versus a deductive one.

The other major problem with many of these threads is the conclusions drawn. Related to your confidence interval argument. It frequently amounts to misinformation i.e. suggesting that the guy who hits the golf ball farthest off the T in a specific setting is the better golfer, or will hit for greater distance in a golf tournament, when the opposite could be true.

 4 people marked this as a favorite.
N N 959 wrote:
I have to say that its refreshing to see someone who actually understands statistics

I don't know how you can really say this after all of the repeated misuse of terminology and misrepresentations of basic math.

Quote:
It frequently amounts to misinformation i.e. suggesting that the guy who hits the golf ball farthest off the T in a specific setting is the better golfer, or will hit for greater distance in a golf tournament, when the opposite could be true.

Can you give a specific example? In golf you have goals beyond simply hitting the ball as far as possible, but when talking about how much damage characters do in PF I can't think of any normal scenario where doing more damage is actually detrimental. You eventually hit a threshold where it doesn't matter based on monster HP, but to suggest "the opposite could be true" doesn't really seem representative of how combat works in PF2.

You could contrive a specific monster ability that makes that true, but that would still be a specific exception rather than a general principle.

Quote:
DPR calculations are based on contrived examples that the author asserts are representative, essentially an inductive approach versus a deductive one.

The only major attempt to quantify DPR on this forum is one that presented multiple scenarios for consideration. Specific discussions tend to focus on specific examples as the whole premise of the discussion too, which also doesn't really seem to fit your assertion.

What would you consider a proper deductive approach in this scenario anyways?

krazmuze wrote:

Fine - prove me wrong and simulate a levels worth of rolls using actual rolls and not odds.

It is a random process - using the fractional odds tells you the perfect result for an infinite of rolls.

A players results over a level is a finite result that is a sample of the infinite results needed to achieve that perfect odds. If you want a better estimate than one player - you have to simulate many many players over a level. Confidence intervals absolutely do apply.

I have yet to see anyone report on their +/- variance, despite giving a precise fraction that says this option is better than that option.

You're asking me to roll dice a bunch of times? What is the point of that? Let's say I roll 100 times and find that, by chance, I roll in the 33rd percentile, and my results are lower than expected. Who cares? That information is useless because it's not generalizable to anyone else, and it is completely outside of anyone's control, so it has no bearing on what choice is the correct one.

 2 people marked this as a favorite.
Squiggit wrote:
Can you give a specific example?

In PF1? Things like Power Attack or Deadly Aim. Sometimes it's more important to hit, than to hit for power. This is true if hitting debuffs in ways not accounted for by DPR.

Quote:
In golf you have goals beyond simply hitting the ball as far as possible, but when talking about how much damage characters do in PF I can't think of any normal scenario where doing more damage is actually detrimental.

That's because you're ignoring any opportunity cost of straight damage. Which would you prefer, a teammate who killed its target 20% quicker, or one who allowed everyone else to kill their target 20% quicker?

The obvious response to this is that, well, you're only trying to measure who does more damage. And my response is, to what end? What actually matters? Doing "damage" or finishing the encounter with less damage and used resources? But the flawed conclusions from DPR analyses are frequently used by posters to beat others about the head and shoulders that X class is "balanced."

Quote:
You eventually hit a threshold where it doesn't matter based on monster HP, but to suggest "the opposite could be true" doesn't really seem representative of how combat works in PF2.

You're taking the statement out of context. That concept was used in the golf example. What can be true is a class with a "higher" DPR doesn't actually have a higher DPR in actual gameplay.

Quote:
You could contrive a specific monster ability that makes that true, but that would still be a specific exception rather than a general principle.

True, and then I'd be guilty of doing what these DPR spreadsheets do.

Quote:
The only major attempt to quantify DPR on this forum is one that presented multiple scenarios for consideration.

Multiple scenarios? How many different scenarios do you think the this game has? What confidence testing has been done to show that the "multiple" scenarios are actually representative? 1,2, and 3 actions arent even in the universe of representing game combat. If you're doing an actual "analysis," then you have to prove your scenarios are actually representative. No one does that.

Quote:
What would you consider a proper deductive approach in this scenario anyways?

Actual game play statistics, not contrived spreadsheet analysis. Go play the same scenario 1000 times and get a stat. Then modify that thing you're claiming is superior DPS and do it again. Do it with all the different classes as support. Do it across all levels, against all monster types. Do you see where this is going? You have people insisting their limited analsys is representative of some truth. Prove it. Show me the actual game play stats.

N N 959 wrote:
Squiggit wrote:
What would you consider a proper deductive approach in this scenario anyways?
Actual game play statistics, not contrived spreadsheet analysis. Go play the same scenario 1000 times and get a stat. Then modify that thing you're claiming is superior DPS and do it again. Do it with all the different classes as support. Do it across all levels, against all monster types. Do you see where this is going? You have people insisting their limited analsys is representative of some truth. Prove it. Show me the actual game play stats.

Why in the world would I care what choices some braindead player made, when he decided to hit the nearest enemy every single turn, with no thought for tactics or planning? I want to know what is possible, and what I can do, not what someone else did. Your argument is like saying that when trying to optimize an athlete's sprinting performance, we should be looking at how average people sprint, and ignore how Usain Bolt sprints. It's the opposite. The average person is completely irrelevant. We want to know how the best sprinters sprint.

Strill wrote:
Why in the world would I care what choices some braindead player made, when he decided to hit the nearest enemy every single turn, with no thought for tactics or planning?

Uh, no. You do this with the same player, the same system mastery.

Look, never mind. You want to use DPR calcs presented in these forums? Knock yourself out.

DPR calculations seem like they’re most useful in figuring out how much leeway and breathing room you have for any given class or build.

 6 people marked this as a favorite.

N N 959 wrote:
Prove it.

No?

I haven't seen anyone claim that these numbers are representative of anything other than... exactly what they're supposed to represent.

If you believe the numbers and methodology are flawed, cool, but that puts the burden on you to prove your point, not on them to justify their own existence to you just because you don't like them.

To just jump in and make a bunch of demands of other people is absurd and silly and simply saying something is bad and contrived over and over doesn't make it any more true, either.

 2 people marked this as a favorite.
krazmuze wrote:
Fine - prove me wrong and simulate a levels worth of rolls using actual rolls and not odds.

In fact, the whole point of DPR calculation is to avoid rolling 5000 times to calculate DPR. It's even better than that: DPR calculation is the best simulation you can get as it simulates an infinite number of rolls.

Now, if you want to know the result of your next dice roll, you'll have to buy a De Lorean.

N N 959 wrote:
Strill wrote:
Why in the world would I care what choices some braindead player made, when he decided to hit the nearest enemy every single turn, with no thought for tactics or planning?

Uh, no. You do this with the same player, the same system mastery.

Look, never mind. You want to use DPR calcs presented in these forums? Knock yourself out.

You still run into the same problem if DM is braindead in their tactics.

What you want is something that will allow you to simulate every possible circumstance, to determine what is best in each circumstance, but that doesn't exist. The best you can do is to Look at the best cases, look at what complications are preventing you from reaching the best-case, and speculate on how to overcome them.

 2 people marked this as a favorite.
Squiggit wrote:

N N 959 wrote:
Prove it.
No?

This reminds me of a friend I have who insist Einstein's General Relativity theory is wrong. I told him he's going to have to prove it, to which he insists he doesn't have to prove anything.

Quote:
I haven't seen anyone claim that these numbers are representative of anything other than... exactly what they're supposed to represent.

Then you're being intellectually dishonest. The entire premise behind these efforts is that these numbers/consclusions are generally applicable.

Quote:
If you believe the numbers and methodology are flawed, cool, but that puts the burden on you to prove your point, not on them to justify their own existence to you just because you don't like them.

No, that's not how it works in science. I am not the one presenting formulas sans actual data as proof of something. If I make an assertion that X is true based on some formulas I've put together on a spreadsheet, then I have the burden of proving my models are actually predictive. Einstein's theories had to be proven before they were accepted. You're trying to insist I have to disprove any random DPR analysis, otherwise we have to accept it as correct. That's not how it works.

Quote:
To just jump in and make a bunch of demands of other people is absurd and silly and simply saying something is bad and contrived over and over doesn't make it any more true, either.

I'm not making any demands. I'm saying that there is no proof behind any of this. None of it is substantiated by actual game data.

As I told Strill, you're entitled to subscribe to whatever you want in making decisions. I posted in this thread to thank krazemuze for sharing his/her knowledge of statistical analysis and inherent problems with these types of efforts. I am not telling anyone what to do or how to make their own decisions.

 1 person marked this as a favorite.

A fight broke out at your gaming store because some charop declared they are badly losing because their idiot cleric did not take an 18 and they do not want to play at such a table

OK lets look at Treat Wounds after every encounter cleric trys to heal everyone once, what percentile is successful? We will just simply say this is 40 rolls for the level since that fits nicely with someone saying that variance does not matter after several dozen rolls....

Indeed charop says you improve the odds 5% by going for that 18.

Trained Cleric with WIS +4

65% = median(sum((randi([1,20],40,1e6)+7)>=15)/40)

Trained Cleric with WIS +3

60% = median(sum((randi([1,20],40,1e6)+6)>=15)/40)

But I do not care about the million players, I care about the table next to mine so lets take some random samples that are close to the median.

Trained Cleric with WIS+4

55% = sum((randi([1,20],40,1)+7)>=15)/40

Trained Cleric with WIS+3

70% = sum((randi([1,20],40,1)+6)>=15)/40

(the actual inner sigma range spans about 15% - it did not take many samples to find these example)

So my +4 behaved like a +2, while my neighbor +3 behaved like a +5! So even a coarse sampling shows the +1 is getting buried by +/-2 variance.

So they charop gets online and complain to Paizo about their broken game, who in response asks stores to survey what is going on at their tables how good/bad does it really get?

25% min success rate for the WIS+4
95% max success rate for the WIS+3

OK that is mind blowing - the worst 'good' clerics failed to heal 30/40 times while the best of the 'bad' clerics healed 38/40 times!

Now to do this properly I would need the stats package in matlab that actually calculates 95% CI but what you do know it this will be slightly less range than the min/max but way more range than the anecdotal sample from a store.

Now lets do a similar analysis for rolling 2d8 40 times

360 = median(sum(randi([1,8],40,1e6)+randi([1,8],40,1e6)))

which is exactly 9*40 as the odds would say

Now lets take a random sample to compare our two tables

max(sum(randi([1,8],40,1)+randi([1,8],40,1)))/360

and all the tables

min(sum(randi([1,8],40,1e6)+randi([1,8],40,1e6)))/360

max(sum(randi([1,8],40,1e6)+randi([1,8],40,1e6)))/360

Roughly I can say that every cleric can be expected to heal within +/-30% off the expected median but typical random sample results might be +/-10%.

Now compound this with all those missed heals variance which I did not consider in my healing calc, as well as the variance of fumbles and crit success? Lunch is long over so I am not going to write that program. But this is enough to convince me that +1 does not matter when you are using uniform die that are very variant.

Go play a wargame that uses dice pools that give more predictable results.

 3 people marked this as a favorite.
krazmuze wrote:

A fight broke out at your gaming store because some charop declared they are badly loosing because their idiot cleric did not take an 18 and they do not want to play at such a table

OK lets look at Treat Wounds after every encounter cleric trys to heal everyone once, what percentile is successful? We will just simply say this is 40 rolls for the level since that fits nicely with someone saying that variance does not matter after several dozen rolls....

Indeed charop says you improve the odds 5% by going for that 18.

Trained Cleric with WIS +4

65% = median(sum((randi([1,20],40,1e6)+7)>=15)/40)

Trained Cleric with WIS +3

60% = median(sum((randi([1,20],40,1e6)+6)>=15)/40)

But I do not care about the million players, I care about the table next to mine so lets take some random samples that are close to the median.

Trained Cleric with WIS+4

55% = sum((randi([1,20],40,1)+7)>=15)/40

Trained Cleric with WIS+3

70% = sum((randi([1,20],40,1)+6)>=15)/40

(the actual inner sigma range spans about 15% - it did not take many samples to find these example)

So my +4 behaved like a +2, while my neighbor +3 behaved like a +5! So even a course sampling shows the +1 is getting buried by +/-2 variance.

So they charop gets online and complain to Paizo about their broken game, who in response asks stores to survey what is going on at their tables how good/bad does it really get?

25% min success rate for the WIS+4
95% max success rate for the WIS+3

OK that is mind blowing - the worst 'good' clerics failed to heal 30/40 times while the best of the 'bad' clerics healed 38/40 times!

Now to do this properly I would need the stats package in matlab that actually calculates 95% CI but what you do know it this will be slightly less range than the min/max but way more range than the anecdotal sample from a store.

Now lets do a similar analysis for rolling 2d8 40 times

360 = median(sum(randi([1,8],40,1e6)+randi([1,8],40,1e6)))

which is exactly 9*40 as the odds would
...

So you're saying that you rolled a d20, 40 times, and checked the average, and because the results were not equal to the expected odds, you think that proves something?

Randomness is random. No s&\$%. It's entirely possible, though astronomically unlikely, that someone rolls a 1 every single time they pick up a d20. Why should I care what someone's rolls happened to be? Just because someone has an anecdote of how they rolled poorly doesn't change what the odds are, or what the correct choice is. To think that someone else's lucky or unlucky rolls have anything to do with you or your rolls, is the gambler's fallacy.

Are you trying to explain the Law of Large Numbers or something? Do you think we don't understand it?

 2 people marked this as a favorite.
Strill wrote:
What you want is something that will allow you to simulate every possible circumstance, to determine what is best in each circumstance, but that doesn't exist.

I don't "want" anything from DPR posts. I came into this thread specifically to ask for krazemuze to give some feedback on other problems I see with DPR efforts.

Quote:
The best you can do is to Look at the best cases, look at what complications are preventing you from reaching the best-case, and speculate on how to overcome them.

Close. The best you can do is understand why these models are often tantamount to misinformation. That helps you understand the problem better and know the limitations of any conclusions that you might be tempted to draw.

 2 people marked this as a favorite.

@Strill

Yes you clearly do not understand the law of large numbers if you think that the precise odds of a uniform dice determines your performance in your RPG. The very simple fact is that your rolls that you will have are way way in the high variance because you will always have a low low number of rolls within a level. This is why I insist that if you want to talk DPR include the +/- variance so that I can see which options are not significantly different. I know that 1+/-2 is not significant and I will pick the option with more flavor that I like, but a 5+/-.5 is worth taking unless I really dislike its flavor.

As I said I did the 40 rolls because someone else says all it takes is a few dozen rolls to overcome the variance of the die. This analysis shows this is clearly not true - the law of large numbers does apply.

The Pathfinder devs do know this - it is why they added level to everything so that you could have a range of outcomes where the dice variance will simply not matter because you will always hit or always fail if you go beyond the threat range. That is because it is not fun that the kobold killed your legendary fighter.

Whereas D&D 5e decided to embrace that variance in the uniformity of the die and did away with the level stepping that 4e had, because random variance makes for more interesting improv story telling. It is fun to remember the time your wizard crit the dragon with their dagger, even though that makes little sense.

Not sure how you can say variance is irrelevant when the two major RPG have decided to take advantage of it in different directions? If the wanted to remove its significance - they would use the dice pools that wargames use.

 1 to 50 of 153 << first < prev | 1 | 2 | 3 | 4 | next > last >>