The `kstest` function in MATLAB performs the Kolmogorov-Smirnov test to compare a sample distribution against a reference probability distribution, determining if they differ significantly.
Here's a code snippet demonstrating the use of `kstest`:
% Sample data
data = randn(100, 1); % Generate 100 random samples from a normal distribution
% Perform the Kolmogorov-Smirnov test against a normal distribution with mean 0 and std 1
[h, p] = kstest(data, 'CDF', {@normcdf, 0, 1});
disp(['Hypothesis test result (h): ' num2str(h)]);
disp(['p-value: ' num2str(p)]);
Understanding the Kolmogorov-Smirnov Test
What is the K-S Test?
The Kolmogorov-Smirnov test (K-S test) is a non-parametric statistical test used to determine if a sample comes from a specific distribution. It does this by measuring the distance between the empirical distribution function of the sample and the cumulative distribution function of the reference distribution. The K-S test is particularly useful because it doesn't make assumptions about the distribution of the data, making it a versatile tool in statistical analysis.
When to Use the K-S Test?
The K-S test is appropriate in various scenarios, particularly when you need to check if your sample data follows a particular theoretical distribution. You might consider the K-S test in the following situations:
- When the sample size is small or large.
- When you want to compare a sample against a known distribution.
- When comparing two samples without making parametric assumptions about their distributions.
It’s important to note that while the K-S test is powerful, it is less effective with small sample sizes as minor deviations may appear significant.

The `kstest` Function in MATLAB
Function Syntax
The syntax for the `kstest` function in MATLAB is quite straightforward:
[h, p, ksstat] = kstest(data, alpha)
Here’s what the parameters mean:
- data: A vector of sample data you want to test.
- alpha: An optional parameter that specifies the significance level for the hypothesis test (default is 0.05).
Return Values
When you execute the `kstest` function, you will receive three output values:
- h: This represents the hypothesis test result. A value of 1 indicates the null hypothesis is rejected (the sample does not follow the reference distribution), while 0 indicates it is not rejected.
- p: The p-value indicates the probability of observing the data assuming the null hypothesis is true. A small p-value suggests that the sample data does not follow the specified distribution.
- ksstat: This is the computed test statistic that quantifies the distance between the empirical and theoretical CDFs.

Step-by-Step Guide to Using `kstest`
Step 1: Prepare Your Data
Before conducting the K-S test, it's vital to have your data organized. Here’s an example of how to generate random sample data using MATLAB:
data = randn(100,1); % Generate 100 random data points from a normal distribution
This line generates 100 data points from a standard normal distribution, allowing you to test whether this sample follows a normal distribution.
Step 2: Conduct the K-S Test
Using Theoretical Distributions
To test the generated data against a theoretical normal distribution, you can use the following command:
[h, p, ksstat] = kstest(data, 'CDF', {@normcdf, mean(data), std(data)});
In this example:
- The `CDF` argument specifies that you are providing a cumulative distribution function (`normcdf` for normal distribution).
- `mean(data)` and `std(data)` are used to define the theoretical parameters of the normal distribution based on your sample data.
Using Empirical Distributions
If you wish to compare the sample against an empirical distribution, you can first compute the empirical cumulative distribution function (ECDF) and then conduct the K-S test:
[empiricalCDF, xValues] = ecdf(data);
[h, p, ksstat] = kstest(data, 'CDF', empiricalCDF);
This code segment first calculates the ECDF for your data and then tests the sample against this empirical distribution.
Step 3: Interpreting the Results
Once you have executed the `kstest`, it is crucial to understand the results:
- The output h indicates whether the null hypothesis is rejected (1) or not (0). If you have a result where h = 1, it suggests that the data does not follow the hypothesized distribution.
- The p-value informs you about the strength of the evidence against the null hypothesis. A common threshold is 0.05; if `p < 0.05`, you might conclude that the data does not follow the specified distribution.
- Lastly, the ksstat provides the actual test statistic, showcasing how far your sample is from the theoretical distribution.

Common Pitfalls to Avoid
Issues with Sample Size
One of the key challenges with the K-S test is its dependence on sample size. Smaller sample sizes may yield unreliable results. A small deviation in the data might suggest the distribution is significantly different when, in reality, the difference is minor and could be attributed to sampling variability.
Misinterpreting Results
Another common pitfall is the misinterpretation of p-values. It’s crucial to remember that a p-value represents the strength of evidence against the null hypothesis—it is not a definitive measure of whether the null hypothesis is true or false.

Advanced Usage of `kstest`
Custom CDFs
MATLAB allows users to define custom cumulative distribution functions for the K-S test. This can be beneficial when evaluating samples against non-standard distributions. Here’s an example of how to test against a custom exponential distribution:
lambda = 1; % rate parameter
[h, p, ksstat] = kstest(data, 'CDF', @(x) 1 - exp(-lambda * x));
This segment specifies that your sample will now be compared against an exponential distribution with a defined rate parameter.
Using Multiple Hypothesis Tests
When conducting multiple comparisons, it’s important to adjust the significance level accordingly to avoid false positives. Techniques like the Bonferroni correction can be used to control for the increased risk of Type I errors.

Practical Applications of `kstest` in Real-world Scenarios
Example: Quality Control in Manufacturing
In manufacturing, the K-S test can be employed to analyze product consistency. Quality control teams might use the K-S test to determine whether product measurements conform to the expected distribution, thereby ensuring consistent product quality.
Example: Finance and Risk Management
In finance, the K-S test can assess whether asset returns adhere to a particular statistical distribution, which aids in risk management decisions and investment strategies.

Conclusion
The `kstest` function in MATLAB is an invaluable tool for statisticians and researchers needing to assess whether a sample combines with a specific distribution. Understanding how to effectively use this function enhances your statistical testing capabilities and allows for more informed decision-making in various fields. By engaging with the different examples and applications contained within this article, readers are encouraged to apply their newfound knowledge of the K-S test in adequate scenarios.