# A little help?

I think I need a little help with math and probability. This is one of those things about choosing colored balls from a bag, but I only sorta remember how do to the math on it—and it isn’t actually about choosing the colored balls from the bag. Alas.

The institution that employs me has a policy of spot-testing for COVID-19. They test 5% of the students every week—let’s say, for the sake of mathematical ease and fairly-good approximation that there are 5,000 students and that they test 250, chosen at random, every Wednesday.

In the first week of this program, those tests reveal one positive case: call it 1/250 or 0.4%. My understanding is that this ratio implies an expectation of 20/5,000 (0.4%), or 19 untested students walking around carrying the disease. Now, I’m using “expectation” here as a mathematical term, and so probably incorrectly, but my understanding is that if there were fewer than, say, 5 students with this disease, it would be a stroke of luck to catch one of them in the 250 tests. My instinct is that it would be surprising if there were fewer than 10 or more than 30 COVID-19-positive students on campus untested. Does that sound right?

However, in the second week, they tested another 250 people and had no positive results at all. 0%.

Now, they are currently displaying that result as 1/500, or 0.2%. That ratio implies 10/5,000 positive—10 students, of whom one was caught-and-treated. But that sounds wrong to me, as a description of what is going on—for one thing the status of individuals changes over the course of a week, so there’s a big difference between a single test of 10% of the populace and two tests of 5% a week apart. Particularly since both the prevalence and the rate of spread are useful things to know, and the rate is perhaps more useful than the prevalence, since it tells us whether our containment practices are, you know, working.

It sounds to me like the first draw got ‘lucky’ in catching not 1/20 of the students who were sick but 1/10 or 1/5—surely the spread rate is at least one-for-one—but then it’s completely plausible mathematically that there are 20 students who would have tested positive at the time of the Week Two test, and that none of them happened to be among the 5% chosen.

I will add: in between the randomized tests for Week One and Week Two, six students tested positive. I don’t know why they were tested—they may have been ‘contacts’ of the one randomly-found positive, or they may have showed symptoms, or perhaps they were tested because of some other contact or requirement. I don’t know precisely when they were tested, either—if they had been tested in either randomized test, would they have tested positive or negative? I have no idea. But the implication is, it seems to me, that there are some students walking around with this disease, such that the Week Two result is not an exactly accurate ratio.

So, here’s my question: what should we be expecting from Week 3? What is the range of possible/probable results? What can we know about the ‘actual’ rate of positives among the students, and what would that rate imply for the range of expected results from the 5% test?

Or, more specifically, if the result of Week Three is, f’r’ex, five positive tests, can we say with some confidence that there is a growing number of COVID-contagious students, now upwards of a hundred, and if we don’t make some changes it’s going to be much, much more? Or would it be every bit as likely that there have been between thirty and fifty the whole time, and the week-to-week variance is not showing a compelling upward trend at all, and that our current containment practices are working pretty well? I feel like it’s important to know which of those things is the case before the numbers actually come out.

Tolerabimus quod tolerare debemus,
-Vardibidian.

## 5 thoughts on “A little help?”

1. Dan P

You know, I started on this, and then I stopped. There’s an awkward handful of free parameters we’re estimating here with just two samples:

* growth rate: even assuming it’s a constant due to constant efficacy of safety measures, it’s still unknown — in fact, if I understand your question correctly, this is THE parameter you’d like to solve for?
* true infection counts for each week
* test sensitivity and specificity

In general, your intuitions seem pretty good to me. Confidence of an estimate goes up with the square root of the sample size, so the statement “about 0.2% of students would test positive during the first two weeks” is about 1.4x more precise than either of “about 0.4% of students were positive during the first week” or “no [additional?–see below] students were positive during the second week” — but then you have this confounding external information that six students tested positive in non-random sampling, none of whom, clearly, were sampled in the second week. I think this points to a flaw in the random sampling, in that students who test positive for any reason are probably withdrawn from the pool.

In short, statistics am hard.

1. Vardibidian Post author

Yeah, my feeling right now is that the weekly test of 5% is not actually giving us any usefully interpretable information about the population at all, but it’s presumably nice for the people they happen to catch and, again presumably, treat.

I am left, however, with the sense that we have no idea at all whether our current containment measures are working—and a concern that the administration will make decisions based on how well those containment measures are working despite not having any reliable information about that at all. On the plus side, the surrounding area is still very low-prevalence, so maybe lucky is better than right?

Thanks,
-V.

2. Michael

If your true positive rate is 0.4% on campus, then I think a truly random sample of 5% has (I think) a greater than 30% chance of not hitting any of those positive cases. That tells me that a 5% sampling doesn’t give you any real precision at a very low incidence rate (or it doesn’t provide any real confidence at that level of precision), which is reasonable if you aren’t planning on taking any action until the incidence rate rises to a much higher rate.

With a true positive rate of 0.4%, a sample of 5% turning up 5 positives would be very startling.

3. Vardibidian Post author

Update: Week Three had a positive rate of 0.5%, which seems to be in the general realm of ‘holding steady’, meaning that I think all three weeks are plausible outcomes for there being between zero and a hundred undiscovered cases on campus at the time of the testing. At least, I feel like I haven’t learned anything that should lead me to panic.

On the other hand, they evidently increased the testing from 5% to 8%, which is… probably good? But makes it even more difficult to have a sense of whether our prevalence rate is increasing, and if so, how rapidly.

Thanks,
-V.

4. Chris Cobb

Unless asymptomatic cases are running higher than has been documented among college-age populations, I think that the cases showing up through symptomatic testing would have to be higher for the higher end of the range of undiscovered cases in the student population that you posit. Unless the Week 3 number of cases discovered through testing people with symptoms was up around 20, I would think your upper bound for undiscovered cases would be more like fifty (says the person with no background in probability and statistics, so take this comment for what it is worth).

Meanwhile, to neither brag nor jinx but report, my home campus has now gone nearly two weeks without a positive case. It does seem to be the case that under current conditions, operating a residential college safely is possible but by no means guaranteed. Local Division 1 football team played its first game this weekend, so we are holding our breath about what happens late this week. Word was that under 10,000 spectators, almost exclusively students were in the stadium that ordinarily seats 80,000, so social distancing may have been maintained. No outside-the-stadium crowds were present, either (huge change from usual). So we’ll see . . .

This site uses Akismet to reduce spam. Learn how your comment data is processed.