What is Correlation
Correlation includes any broad category of statistical relationships having dependence. Similar instances of the phenomena of dependence consist of the correlation between the physical features of the offspring and their parents, and the correlation between price and demand of any product.
Uses of Correlation
Correlations have usefulness in terms of recognizing a predictive relation which can be extracted in practice. For instance, any electrical item can yield less power on a mild day on the basis of the correlation between weather and the demand of electricity. There is a causal relation in this example as the extreme weather results in more usage of electric power by the people for cooling and heating purposes, but statistical dependence is not enough to show the occurrence of this type of causal relation.
Basically, dependence relates to any circumstance where random variables are unable to satisfy a mathematical situation having probabilistic independence. Generally, correlation means any departure of more than two or two random variables from independence, however in technical terms it implies any of the various specialized kinds of relations among mean values. A number of correlation coefficients exits, sometimes denoted by r or p, calculating the level of correlation. The very common ones include Pearson Correlation Coefficient, that is only sensitive to a linear relation between two variables. There are more robust correlation coefficients other than this which means that they are more sensitive to nonlinear relations. In order to calculate dependence between two variables mutual information can be applied too.
What is Correlation Matrix
A correlation matrix has the purpose of finding out the dependence among various variables on the same time. The results contain a table having correlation coefficients among every variable and the rest of them. There are various techniques for correlation analysis. These include Kendall and Spearman correlation analysis based on ranks and Pearson Parametric Correlation. A correlogram is used for visualization of correlation matrix.
While talking about a correlation matrix, one usually implies a matrix having correlations like Pearson’s. But, such correlations are impacted by outliers, non normality, unequal variances and nonlinearties. Pearson correlation coefficients’ main competitors include the Spearman-rank correlation coefficient. The latter correlation is measured using the formulas of Pearson correlation to the data ranks instead of real data values. This is done because a number of distortions which impact the Pearson correlation are lessened. For permitting you in comparing the two kinds of correlation matrices, there can be a display of a matrix of differences. This permits you in finding out which variable pairs need more investigation. For this purpose, you can specify a group of partial variables. Linear impact of such variables is lessened by moving them out from the matrix. This gives a statistical adjustment to the rest of the variables making use of various regressions. It should be noted however that in Spearman correlations case, the moving of variables takes place when the complete correlation matrix is established.
In the cases having above one independent variable, a group of all correlations pair-wise are briefly shown in the form of a correlation. The purpose of studying these in correlations in regression analysis has two purposes: firstly it is to locate the outliers and to find collinearity. When it comes to outliers, there must be significant distinction among the Pearson correlation coefficient, parametric measure as well as the nonparametric measure and Spearman rank correlation coefficient. In collinearity’s case, the correlations being high pair-wise can be the initial pointers of problems of collinearity. There is an undue influence on the Pearson correlation by outliers, non-linearities, unequal variances and non-normality. As a consequence of such problems, the better option in studying the relation among variables is the Spearman correlation coefficient that has its basis upon the ranks of data instead of the real data. Eventually patterns of missingness in various correlation and regression analysis could be extremely complicated. As a consequence, missing values could be omitted in a row-wise or pair-wise manner. In the cases having fewer observations with missing values, it may be preferred to make use of row-wise omissions, particularly with large data sets. The row-wise procedure of omission deletes the whole observation from analysis. Whereas, if the pattern of missingness is scattered randomly in the whole data and the utilization of row-wise omission would delete more than 25% of observation, the process of pair-wise omission for missing values will be a better option to gain the essence of relation among variables.
Where this technique is said to use the entire data, the correlation matrix that results from this might have interpretation and mathematical issues. This correlation matrix mathematically might not possess positive determinant. As every correlation might have its basis on various row sets, practical analysis can be a problem if it is not logical.
The monotonic link among variables in terms of ranks is measured by the Spearman correlation coefficient. It calculates if one variable rises or lessens in amount with another even if the relation between the two is not bivariate normal or linear. Technically, both these variables are given ranks separately, and the computation of ordinary Pearson correlation coefficient is done on ranks. Such non-parametric correlation coefficient has a better measurement in terms of relation between two variables in cases where non-constant variance, outliers, non-linearity, and non-normality might be present between the under investigation two variables.
Reading a Correlation Matrix
The calculation of correlation matrix is done by the data variables’ sample variance. For instance, if you have to calculate correlation coefficient of x and y, two variables, the following formula will be applicable:
This formula is applicable to all variables which form the matrix of variance-covariance for making a correlation matrix. But, in the first row first element, second row second element and third row third element of correlation matrix will always be one. This is so due to the fact it demonstrates correspondence of a variable with itself. Whenever you have to calculate the correlation coefficient for x variable with itself, you will always get one.
It can also be observed that the correlation matrix is symmetric too. This is because the upper part of the matrix is symmetric to the lower part of it.
You can see that the correlation matrix is a symmetric matrix as well. The upper half of the matrix is mirrored by the lower half of the matrix. From the above example it can be observed that all the values of correlation coefficients can be seen between plus and minus 1. It is not possible for the coefficient to be an exact minus 1 or plus 1. This implies that in reality, there will be no variable that has a stable relation with some other variable at all times. If there is a coefficient which is positive, an increase in the value of one variable would demonstrate a corresponding increase in the other variable’s value.
If the value of a coefficient is close to 1, there will be a closer relation between the two data variables. Also, if the coefficient’s value is negative or positive but is very less, it implies that there might be less or no relation between the data variables. In reality, an example of like this could be taken: the relation between gold prices and medicine prices-an increase or decrease in the price of gold will have little or no effect on the price of medicine, which means that there is no or little relation between these two variables.
Correlation in the Real World
Correlation Matrix in Stock Market
In the stock market, a correlation matrix might show the long-term, short-term or medium-term relation among data variables. For instance, if the prices of silver and gold are considered, it can be observed that in the long-time, the prices of both are dependent on each other: when one rises the other rises too. This demonstrates a positive correlation between them in long-term. But, this might not be the situation in medium-term or short-term for both of them. It can be seen hence that a correlation matrix can be of great use to the traders. For making an accurate prediction, data is mostly gathered from a considerably long time span, put in a numerical form and then examined by a correlation matrix.
Correlation Matrix Used by Actuaries
Correlation Matrix is also made use in actuaries for the purpose of measuring risk for banks and big organizations. The data variables in this case are risk factors and an examination is done of their relationship. This might be categorized as high risk which is of above a value of 0.75, low risk which is less than 0.25 and medium risk that is between 0.25 and 0.75. Usually, it is not possible to measure a correlation matrix when there are several risk factors included. In various situations, matrices become very difficult to understand. To find a solution to this problem an eigen decomposition matrix is calculated from standard matrix that makes it a lot simpler to understand and study.