Fix a model threshold via TPR and FPR, then see how precision and false positives change when you deploy to larger areas and/or regions where the positive class is rarer.
Let \(\pi = P(Y=1)\) be the true prevalence of a class you care about in the region you evaluate on, and let \(\text{TPR} = P(\hat Y=1 \mid Y=1)\) and \(\text{FPR} = P(\hat Y=1 \mid Y=0)\) be properties of the model you trained for detecting this class at a fixed threshold. Precision is \(\text{Prec}=P(Y=1\mid \hat Y=1)\). By Bayes' rule:
\[ \text{Prec}(\pi) = \frac{\text{TPR}\,\pi}{\text{TPR}\,\pi + \text{FPR}\,(1-\pi)}. \]
Two different "scale" effects get mixed together:
(A) Base-rate (prevalence) effect: Precision depends on \(\pi\). If you move from a balanced evaluation (\(\pi=0.5\)) to a deployment region where \(\pi\) is tiny, precision can drop sharply unless \(\text{FPR}\) is extremely small.
(B) Volume (area) effect: The count of false positives depends on how many negatives you scan. If you deploy to an area \(k\) times larger than your validation set, expected false positives scale by \(k\):
\[ \mathbb{E}[\text{FP}] \approx \text{FPR}\,(1-\pi)\,k \cdot N_0. \]
Notice that \(\mathbb{E}[\text{FP}]\) grows linearly with deployment area, even if precision stays unchanged. So you can have a model that "looks good" on balanced tests but still produces a painful number of false alarms when deployment is huge and the landscape is mostly negative.
In this playground, the validation set is fixed at prevalence \(\pi_0=0.5\) and size. You can pick a TPR and FPR, then simulate scaling the model over a larger deployment area and optionally change the deployment prevalence \(\pi\) to see how precision and false positives change.