The Kolmogorov-Smirnov (K-S) test in MATLAB is a non-parametric test used to determine if two samples come from the same distribution or if a sample follows a specified distribution.
[h, p] = kstest2(sample1, sample2); % Perform the K-S test between two samples
Understanding the Kolmogorov-Smirnov Test
What is the Kolmogorov-Smirnov Test?
The Kolmogorov-Smirnov Test (K-S Test) is a non-parametric statistical test that compares the cumulative distribution functions of two samples or compares a sample distribution with a reference probability distribution. This test is useful for assessing whether two datasets differ significantly or whether a dataset fits a specified distribution.
Importance in Statistical Analysis
Using the K-S test offers several benefits:
- It does not assume any specific distribution for the data.
- It can be applied to small sample sizes as well as larger ones.
- It provides a clear visual representation of the differences in distributions, which can aid interpretation.

Setting Up MATLAB for the K-S Test
Required MATLAB Toolboxes
To perform the K-S test in MATLAB effectively, you will need the Statistics and Machine Learning Toolbox. This toolbox includes functions for statistical analysis, including `kstest` and `kstest2`, which are necessary for conducting the K-S tests.
Installing MATLAB
If you do not already have MATLAB installed, you can download it from the MathWorks website. Follow the installation prompts to set it up. Familiarize yourself with the MATLAB interface, focusing on the Command Window, Editor, and workspace.

Performing a One-Sample K-S Test in MATLAB
Introduction to One-Sample K-S Test
A one-sample K-S test compares the cumulative distribution of a single sample to a specified continuous distribution, often to assess if the sample deviates from a theoretical distribution.
MATLAB Syntax for One-Sample K-S Test
The main function for performing a one-sample K-S test in MATLAB is `kstest`. The general syntax is:
[h, p] = kstest(data, 'CDF', your_cdf)
- data: The sample data you wish to test.
- 'CDF', your_cdf: The specified cumulative distribution function to compare against (e.g., normal distribution).
Example of a One-Sample K-S Test in MATLAB
To illustrate, consider testing a random sample against a normal distribution. The following code snippet demonstrates:
data = randn(100, 1); % Generate a random sample from a normal distribution
[h, p] = kstest(data); % Perform the K-S test against normal distribution
fprintf('Test result: h = %d, p-value = %.4f\n', h, p);
In this example:
- `h` is the test result; a value of 1 indicates the null hypothesis (that the sample comes from the specified distribution) is rejected.
- `p` provides the p-value; generally, a value below 0.05 would imply statistical significance, indicating that the sample does not come from the assumed distribution.

Performing a Two-Sample K-S Test in MATLAB
Introduction to Two-Sample K-S Test
The two-sample K-S test evaluates whether two independent samples originate from the same continuous distribution.
MATLAB Syntax for Two-Sample K-S Test
To conduct a two-sample test in MATLAB, use the function `kstest2`. The general syntax is:
[h, p] = kstest2(data1, data2)
- data1 and data2 are the two samples being compared.
Example of a Two-Sample K-S Test in MATLAB
Consider a scenario where we want to compare two different random samples:
data1 = randn(100, 1); % Generate first random sample
data2 = randn(100, 1) + 1; % Generate second random sample with a shift
[h, p] = kstest2(data1, data2); % Perform the K-S test
fprintf('Test result: h = %d, p-value = %.4f\n', h, p);
In this example, the code generates two independent random samples, where the second sample is shifted by one unit. The test results allow us to determine whether there is a significant difference between the two samples.

Visualizing the Results
Importance of Visualization in Statistical Testing
Visual representation of data helps convey the results of the K-S test more effectively. Using plots, it becomes easier to identify discrepancies between distributions and understand statistical results at a glance.
Plotting the Empirical Cumulative Distribution Functions (CDFs)
To visually compare the samples, plot their empirical cumulative distribution functions (CDFs) as shown below:
[f1, x1] = ecdf(data1); % Calculate CDF for the first sample
[f2, x2] = ecdf(data2); % Calculate CDF for the second sample
figure; % Create a new figure
plot(x1, f1, 'r', x2, f2, 'b'); % Plotting CDFs
xlabel('Value');
ylabel('Empirical CDF');
legend('Sample 1', 'Sample 2');
title('Empirical CDFs Comparison');
In this plot, the CDFs of the two datasets are displayed, allowing you to visually assess their differences. If they diverge significantly, it supports the results of the K-S test.

Interpreting the Results
Understanding the Output of K-S Tests
Interpreting the output of the K-S test requires understanding the null hypothesis, which posits that the samples come from the same distribution. If the p-value is less than a significance level (commonly set at 0.05), you would reject the null hypothesis, indicating that the differences observed are statistically significant.
Common Misinterpretations
One common misconception is relating the p-value directly to the probability that the null hypothesis is true or false. Instead, p-values reflect the probability of observing the data if the null hypothesis were true. Ensure to communicate results clearly to avoid misunderstandings.

Practical Applications of the K-S Test
Real-World Scenarios
The K-S test finds application in various disciplines, such as:
- Finance: Analyzing the distribution of returns to assess the performance of financial models.
- Machine Learning: Comparing the output distributions of different models to evaluate stability and performance.
Case Study
Consider a practical scenario in finance, where you want to compare the distribution of daily returns of two different stock portfolios. You would collect return data, apply the two-sample K-S test using MATLAB, and visualize the results to interpret how similar the behaviors of the two portfolios are.

Conclusion
The K-S test in MATLAB serves as a powerful tool for statistical analysis, enabling users to compare distributions effectively. By understanding its implementation, interpretation, and practical applications, you can enhance your data analysis skills significantly.

FAQs About the K-S Test in MATLAB
What are the assumptions of the K-S test?
The K-S test assumes that the sample data are continuous and independent. It also assumes that there are no ties in the data points.
How does the K-S test handle ties in data?
While the K-S test is designed for continuous data, it can be adapted for discrete data, but caution should be taken as ties can complicate interpretations.
Can the K-S test be used for non-parametric data?
Yes, the K-S test is specifically useful for non-parametric data, as it does not depend on specific distributional assumptions.