Biologists use something called the mark-and-recapture method to estimate the size of an animal population based upon a small sample of animals viewed. Mathematically-inclined voters with a few hours of free time can do the same thing with the items on which we're voting.

Here's a simplified version of the mark-and-recapture method:

(1) Record the name of every item you see while voting until you have exactly 100 unique item names*.
(2) Vote exactly 50 more times, counting the number of times you see each item from your list.
(3) Calculate the total number of times you saw items on your list during your 50 votes.
(4) Divide 10,000 by your total from (3).

The result of (4) is a rough estimate of the number of items there are. The more estimates we collect, the more our average result will tend towards the actual number of items.

*The item names you record in step (1) should be items you've seen after the bottom 25% were culled. Once you have a list of 100 unique items, you can reuse that list for steps (2) and (3) if you want to take a new sample.

---

I just did one mark-and-recapture estimate and saw items from my list of 100 items a total of 10 times in 50 votes, producing the following result:

Estimated Number of Items: 1,000

I was told there would be no math...

I just took a new sample, and saw the items on my list 14 times in 50 votes, producing the following result:

Estimated Number of Items: 714 (almost certainly an underestimate)

Another sample of 50 votes taken, revealing 13 marked items.

Estimated Number of Items: 769

Average Estimate So Far: 878 items

When we count occurances during the fifty votes, do we count how many items of the hundred appear, or how many times an item from that list appears? I started doing this and within the first ten votes I had seen the same item from my list twice.

Eric Morton wrote:

Another sample of 50 votes taken, revealing 13 marked items.

Estimated Number of Items: 769

Average Estimate So Far: 878 items

Average estimate divided by .75 (to compensate for the Cull) gives an initial estimate of 1170 items, which sounds reasonable.

And since this is after the cull, we can assume this number is just 75 percent of the total submitted, which would give us...

1,170(.6666...) if we use your 878 average.

If we take your first estimate, since I think we can agree the 714 seems low, we get 1,333. I still had a gut feeling there were more like 1,500-2,000 entries, but if you also count the approximately 10 percent that are DQs, that'd put us at 1,481 original entries, which would put us in that range.

Or if we want to look at it this way, 32 is 3 percent of 1,066.66 and 2 percent of 1,600, so we're probably looking for something in that range.

You are sampling to late IMO.

About an hour ago I saw a good item (IMO) for the first time which I do not believe I have seen before since the contest started.

The longer you vote the more often you see certain items paired with other items more frequently. Other Marathon Voters have posted observing the same effect.

It would have been most accurate immediately after the cull IMO.

Possibly more accurate immediately after daily Paizo network reboot.

Nazard wrote:
When we count occurances during the fifty votes, do we count how many items of the hundred appear, or how many times an item from that list appears? I started doing this and within the first ten votes I had seen the same item from my list twice.

During the 50 votes, just count items from your list. You can ignore repeat items that aren't on your list.

CHEERS wrote:
It would have been most accurate immediately after the cull IMO.

Based on my voting experience, the accuracy of any one estimate will also vary depending upon the time of day, and upon who's voting that day. But the accuracy of the average estimate should improve as more samples are taken.

Brilliant idea, we should have started this days ago; then we would have been able to compare estimates before and after the cull to see how close we were. But, late is better than never! I did a sampling following your methodology, and saw 16 of 100 tagged items in 50 votes, leading to an estimate of 625 items.

Took another sample and saw 12 of the items I tagged in 50 votes; estimate for this sample: 833 items.

Average Estimate So Far: 818 items.

Another sample had 11 of my tagged items. Estimate for this sample is 909 items.

I'm also curious about the total number of submissions, but it seems to me these estimates are a bit too low. I recall Clark saying last year that the top 32 represents the top 2-3% of all submissions. As Jacob pointed out, that's 1066-1600 submissions. And this year set the new record, so I'm quite sure the number is closer to 1500 than 1000.

Mikko Kallio wrote:
I'm also curious about the total number of submissions, but it seems to me these estimates are a bit too low. I recall Clark saying last year that the top 32 represents the top 2-3% of all submissions. As Jacob pointed out, that's 1066-1600 submissions. And this year set the new record, so I'm quite sure the number is closer to 1500 than 1000.

When Clark said that was he stating statistics or just using it as a figure of speech?

Unfortunately, it's not that simple. I've been kind of trying to figure it out myself, but there's several problems:

1. You see two items at a time, item 2 is guaranteed not the be the same as item 1.

2. What's actually being done is more like mark and recapture with re-release into the pool. That's much more complicated.

3. The fact that the items you see are NOT completely random (you see the item with fewest votes) complicated matters, making it depend on timing and a few other things.

It can be done, it's just way more complicated than the relatively simple equation in the first post. I think in order to even treat it as mark and recapture with release, you have to treat the left-hand and right-hand sides as two different sets of data, but that's just a hunch.

I'm restarting my figures from when the field was winnowed, however my initial guess before they trimmed 25% was somewhere between 1700 and 1900 items, which makes it highly unlikely I'll see all of them even post-removal of the 25%. I haven,t run my stats since the cull, but I've voted just over 150 times since then.

Edit: As you pick more and more items, your count will asymptotically approach the actual count. In this graph the green line is the number of unique items I'd seen; it was still aimed pretty strongly up when I'd seen over 500 unique items (I'd seen just over 30 pairs at that time).

Further Edit: As of now, I've seen 82 pairs (164 non-unique items) and 145 unique items (126 once, 17 twice, and 2 three times). I haven't seen my item since the trim. My ugly approximation tells me that means about 1415 items post cull (which would mean approximately 1887 pre-cull). So I'm sticking with my guess of 1800+/-100 items (post-initial DQ).

Eric Morton wrote:
Nazard wrote:
When we count occurances during the fifty votes, do we count how many items of the hundred appear, or how many times an item from that list appears? I started doing this and within the first ten votes I had seen the same item from my list twice.
During the 50 votes, just count items from your list. You can ignore repeat items that aren't on your list..

That didn't quite answer my question, so I'll rephrase. During the fifty votes, I saw an item from my list, and three pairs later I saw it again. If that was the only item from my list, should I be reporting 1 item seen or 2.

Given the number of votes being case, and the fact that this has been in place since the beginning, I doubt this is a significant factor. The spread between most seen and least seen can't be very large.

Unfortunately, it's not that simple. I've been kind of trying to figure it out myself, but there's several problems:

3. The fact that the items you see are NOT completely random (you see the item with fewest votes) complicated matters, making it depend on timing and a few other things.

shammond42 wrote:

3. The fact that the items you see are NOT completely random (you see the item with fewest votes) complicated matters, making it depend on timing and a few other things.

Given the number of votes being case, and the fact that this has been in place since the beginning, I doubt this is a significant factor. The spread between most seen and least seen can't be very large.

Oh, I agree 100%. It just makes coming up with a correct formula that much trickier :) All those little caveats and gotchas can add up.

The 1800 +/- 100 (or even 200) seems viable and Jacob W. Michaels is spot on with what I was going to comment about.

Nazard wrote:
That didn't quite answer my question, so I'll rephrase. During the fifty votes, I saw an item from my list, and three pairs later I saw it again. If that was the only item from my list, should I be reporting 1 item seen or 2.

If you saw an item from your list twice, count it twice.

The formula from the OP is the Lincoln index method, which allows for tagged items to be re-released into the population (and potentially recounted) during the process of sampling.

Unfortunately, it's not that simple.

You are correct that we won't be able to hit the exact item count using the method in the OP, though not necessarily for the reasons you suggested.

The fact that we are sampling two items at the same time instead of one item at a time will skew the data, but only slightly. The difference between (N choose 1) and (N choose 2) gets smaller as N gets larger, and we're dealing with a reasonably large N.

The biggest source of error is going to be the "Neither" button, which results in weak and average items being displayed more often. Items that keep getting kicked back because they aren't pulling ahead of the pack are going to result in many tagged items being over-represented, which will lead to an underestimate of population size.

The final estimate we get using the Lincoln index method from the OP will likely be closer to the number of average-quality items than the total number of items. Those items are involved in more "Neither" votes, making it more likely they get put back in the queue. (The Lincoln index method assumes that all results are equally likely, which isn't the case here, thanks to "Neither" buttons and weighting algorithms.)

To improve upon the Lincoln index estimate, we'd need to create a model that takes the effects of the "Neither" button into account and apply that to our final number.

I was also hoping to find an equation that relates the number of unique items in a sample of 1000 items to the likely size of the item population, because I recorded just about 600 unique items in 1000 views before the cull. There has to be a way to estimate a population size based on that data (one that's much better than the Lincoln index method), I just haven't yet found it.

Just did this myself.

Out of my sample of 50, got 7 listed items.

estimated #- 1429

Which would put us at 1,905 items before the cull...

Adding in those last two samples, we get...

Average After 6 Samples: 907 items.

Saw 13, for 769 items estimated.

And hit Marathon status in the process. Yay me!

I just saw 15 of my tagged items in another 50-vote sample, for 666 items.

Average After 8 Samples: 860 items.

I just saw 18 of my tagged items in a freakish outlier sample, for an obviously incorrect estimate of 555.

Average After 9 Samples: 826 items.

Moving forward, I'm generating a new list of tagged items, choosing one-hundred items at random from those I know I've seen since the culling. This will prevent any quirks of the voting algorithm from influencing my list of tagged items, and possibly produce different results than those of my first several samples.

I feel like I'm getting the opposite of Eric, here: freakishly low samplings (leading to, likely, inflated estimates.)

I just got 6 marked items, giving me an estimated 1667 items.

I think I'll be switching to the random marked item method as well.

Just finished my first sample after generating my new list of tagged items and saw 16 of them, for an estimate of 625 items to counteract Sean McGowan's influence.

Average Estimate After 11 Samples: 884 items (1,179 items before the cull)

Eric Morton wrote:

Just finished my first sample after generating my new list of tagged items and saw 16 of them, for an estimate of 625 items to counteract Sean McGowan's influence.

Average Estimate After 11 Samples: 884 items (1,179 items before the cull)

Sean just posted this in the JUDGES You are Iron man thread an hour ago:

Remember, if there were 500 entries, culling the bottom 25% still leaves 375 entries... and 125 of those probably are in the 26-50% range for quality.

And remember that even the 51-75% range for quality may be "maybe good enough for a book of magic items."

And we're not guaranteed an even distribution of quality. Maybe 120 are at the 26% quality and 5 are in the 27-50% quality, ya never know.

And there were more than 500 entries, of course. :)

Okay, I've done some additional analysis (including some trial by error solution of the order(N-1) equation that resulted). At this point, my guess is there were about 932 items before the field was narrowed, and there's currently about 699.

That seems to align nicely with the graph I'm seeing, as well. It was pretty clear based on the curve of the number of unique items seen there was no way there were 1700 items before the change.

Got 17 for a calculated number of 588.

michaeljpatrick wrote:
When Clark said that was he stating statistics or just using it as a figure of speech?

I'm not telling :)

Don't forget that we disqualified "about 10%" of the entries for not following the rules... (but I'm not saying how many of those were disqualified before and how many after public voting began...)

Ignoring statistics and just going off gut reactions from Clark and Vic's responses, I'm going to ballpark the amount of items at an even gajillion.

I just saw 14 tagged items in 50 votes, for an estimate of 714 items.

Average Estimate After 13 Samples: 848 items (1,130 items before the cull)

This morning, I ran a quick computer simulation that selected multiple samples of 1,000 random items from populations of various size.

Using this method, I found that a population of 1,100 +/- 50 pre-cull items fits with the results of both my first 1,000 views and Sean McGowan's first 1,000 views. (We're the only two voter's I'm aware of so far that have tracked and posted data for our first 1,000 views.)

That result is also consistent with the estimate we've generated in this thread using the mark-and-recapture method, so I'm fairly confident that we're in the right ballpark when it comes to non-disqualified entries.

Eric Morton wrote:

This morning, I ran a quick computer simulation that selected multiple samples of 1,000 random items from populations of various size.

Using this method, I found that a population of 1,100 +/- 50 pre-cull items fits with the results of both my first 1,000 views and Sean McGowan's first 1,000 views. (We're the only two voter's I'm aware of so far that have tracked and posted data for our first 1,000 views.)

There's a third person doing so. That's how I can do things like create this post-narrow graph of unique vs. total items seen. :)

Granted, infinite series do funny things, but the post-narrow curve appears to be converging on a number below 800.

...the post-narrow curve appears to be converging on a number below 800.

Let us know how many unique views you get when you hit 500 votes and I'll factor it in.

Just counted and got 13 tagged items in 50 votes for an estimate of 769. Also, I seem to have missed one earlier estimate, which I'm adding in now.

Average Estimate After 15 Samples: 842 items (1,122 items before the cull)

Dropping samples that are more than one standard deviation from the mean gives us:

Average Estimate, Dropping Outliers: 763 items (1,017 items before the cull)

Just counted 16 tagged items in 50 votes, for an estimate of 625.

Average Estimate After 16 Samples: 828 items (1,104 items before the cull)

Average Estimate, Dropping Outliers: 752 items (1,003 items before the cull)

All right, I'll admit it. I'm stumped. Why are you using 10,000 as your numerator and not 5,000? The Lincoln model is new to me, and I want to understand it before I start proclaiming it to the plebeians.

If we're using N = (1st-capture * 2nd-capture) / overlap, then why isn't it 100*50/overlap?

I ask this even realizing that this would drastically lower these estimates and therefore makes little sense.

Arkos wrote:

All right, I'll admit it. I'm stumped. Why are you using 10,000 as your numerator and not 5,000? The Lincoln model is new to me, and I want to understand it before I start proclaiming it to the plebeians.

If we're using N = (1st-capture * 2nd-capture) / overlap, then why isn't it 100*50/overlap?

I ask this even realizing that this would drastically lower these estimates and therefore makes little sense.

Since you see 2 items per vote, 50 votes equal 100 captures.

Right! I forgot the most important mathematical lesson I ever learned in college: never drink and derive. Under the cold light of day, that makes perfect sense. Thanks for bringing this theory to my attention!

Vic Wertz wrote:
Don't forget that we disqualified "about 10%" of the entries for not following the rules... (but I'm not saying how many of those were disqualified before and how many after public voting began...)

Vic, you are a sadistic monkey scribbling on our homework with a black sharpie =p

Given the broad range of things that a monkey could do to your homework, that's not so bad!

This is starting to feel like the Drake Equation.

Eric Morton wrote:
...the post-narrow curve appears to be converging on a number below 800.
Let us know how many unique views you get when you hit 500 votes and I'll factor it in.

At 500 votes post-change, I've seen 1000 items (obviously) and 534 unique items. There was a short spat of unique items in the low 400s; my current estimate at vote 500 is 702 unique items post-reduction (leading to approximately 936 before reduction and according to Vic something like 1040 overall).

I'm quite curious to see how it pans out. I may try deriving the Heinous Equation again; I screwed up somewhere in the middle last time.

Okay, ready? I've voted 606 times and seen 573 unique items (post-25%). My current estimate of the number of items out there is 694-695. 695 is probably the more accurate of the two, and is based on the following equation:

E(v) = beta * ( 1 - gamma^(v-1) ) / ( 1 - gamma )

Where:
v is the number of votes
E(v) is the expected number of unique items after v votes

and

beta = 2 - 1/(T-1)
gamma = 1 - 2/T

... where T is the total number of items.

Instead of solving it, you can pretty quickly narrow down the range by plugging in values of v (known) and T (guessed). Not going to put the derivation here, but I'm reasonably sure it's right :)

10% or so Disqualified.

25% Lower Cull.

No comments whether Paizo did an upper Cull or Ongoing Culls in conjunction with the lower cull.

