Discriminability¶
Discriminability One Sample Test¶
-
class
hyppo.discrim.
DiscrimOneSample
(is_dist=False, remove_isolates=True)[source]¶ A class that performs a one sample test of discriminability.
Discriminability index is a measure of whether a data acquisition and preprocessing pipeline is more discriminable among different subjects. The key insight is that each repeated mesurements of the same item should be the more similar to one another than measurements between different items. The one sample test measures whether the discriminability for a dataset differs from random chance. More details are in [1].
Parameters: is_dist : bool, optional (default: False)
Whether x1 and x2 are distance matrices or not.
remove_isolates : bool, optional (default: True)
Whether to remove the measurements with a single instance or not.
See also
DiscrimTwoSample
- Two sample test for discriminability of a two different measurements.
Notes
With \(D_x\) as the sample discriminability of \(x\), one sample test performs the following test,
\[\begin{split}H_0: D_x &= D_0 \\ H_A: D_x &> D_0\end{split}\]where \(D_0\) is the discriminability that would be observed by random chance.
References
[1] (1, 2) Eric W. Bridgeford, et al. "Optimal Decisions for Reference Pipelines and Datasets: Applications in Connectomics." Bioarxiv (2019). -
test
(x, y, reps=1000, workers=-1)[source]¶ Calculates the test statistic and p-value for Discriminability one sample test.
Parameters: x : ndarray
Input data matrices. x must have shape (n, p) n is the number of samples and p are the number of dimensions. Alternatively, x can be distance matrices, where the shape must be (n, n), and
is_dist
must set toTrue
in this case.y : ndarray
A vector containing the sample ids for our \(n\) samples.
reps : int, optional (default: 1000)
The number of replications used to estimate the null distribution when using the permutation test used to calculate the p-value.
workers : int, optional (default: -1)
The number of cores to parallelize the p-value computation over. Supply -1 to use all cores available to the Process.
Returns: stat : float
The computed discriminability statistic.
pvalue : float
The computed one sample test p-value.
Examples
>>> import numpy as np >>> from hyppo.discrim import DiscrimOneSample >>> x = np.concatenate([np.zeros((50, 2)), np.ones((50, 2))], axis=0) >>> y = np.concatenate([np.zeros(50), np.ones(50)], axis=0) >>> stat, pvalue = DiscrimOneSample().test(x, y) >>> '%.1f, %.2f' % (stat, pvalue) '1.0, 0.00'
Discriminability Two Sample Test¶
-
class
hyppo.discrim.
DiscrimTwoSample
(is_dist=False, remove_isolates=True)[source]¶ A class that compares the discriminability of two datasets.
Two sample test measures whether the discriminability is different for one dataset compared to another. More details can be described in [1].
Parameters: is_dist : bool, optional (default: False)
Whether x1 and x2 are distance matrices or not.
remove_isolates : bool, optional (default: True)
Whether to remove the measurements with a single instance or not.
See also
DiscrimOneSample
- One sample test for discriminability of a single measurement
Notes
Let \(\hat D_{x_1}\) denote the sample discriminability of one approach, and \(\hat D_{x_2}\) denote the sample discriminability of another approach. Then,
\[\begin{split}H_0: D_{x_1} &= D_{x_2} \\ H_A: D_{x_1} &> D_{x_2}\end{split}\]Alternatively, tests can be done for \(D_{x_1} < D_{x_2}\) and \(D_{x_1} \neq D_{x_2}\).
-
test
(x1, x2, y, reps=1000, alt='neq', workers=-1)[source]¶ Calculates the test statistic and p-value for a two sample test for discriminability.
Parameters: x1, x2 : ndarray
Input data matrices. x1 and x2 must have the same number of samples. That is, the shapes must be (n, p) and (n, q) where n is the number of samples and p and q are the number of dimensions. Alternatively, x1 and x2 can be distance matrices, where the shapes must both be (n, n), and
is_dist
must set toTrue
in this case.y : ndarray
A vector containing the sample ids for our n samples. Should be matched to the inputs such that
y[i]
is the corresponding label forx_1[i, :]
andx_2[i, :]
.reps : int, optional (default: 1000)
The number of replications used to estimate the null distribution when using the permutation test used to calculate the p-value.
alt : {"greater", "less", "neq"} (default: "neq")
The alternative hypothesis for the test. Can test that first dataset is more discriminable (alt = "greater"), less discriminable (alt = "less") or unequal discriminability (alt = "neq").
workers : int, optional (default: -1)
The number of cores to parallelize the p-value computation over. Supply -1 to use all cores available to the Process.
Returns: d1 : float
The computed discriminability score for
x1
.d2 : float
The computed discriminability score for
x2
.pvalue : float
The computed two sample test p-value.
Examples
>>> import numpy as np >>> from hyppo.discrim import DiscrimTwoSample >>> x1 = np.ones((100,2), dtype=float) >>> x2 = np.concatenate([np.zeros((50, 2)), np.ones((50, 2))], axis=0) >>> y = np.concatenate([np.zeros(50), np.ones(50)], axis=0) >>> discrim1, discrim2, pvalue = DiscrimTwoSample().test(x1, x2, y) >>> '%.1f, %.1f, %.2f' % (discrim1, discrim2, pvalue) '0.5, 1.0, 0.00'