{"id":128,"date":"2021-01-06T23:13:29","date_gmt":"2021-01-06T23:13:29","guid":{"rendered":"http:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/ziyang-yang\/?p=128"},"modified":"2021-04-30T12:13:00","modified_gmt":"2021-04-30T12:13:00","slug":"dimensionality-reduction-pca","status":"publish","type":"post","link":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/ziyang-yang\/2021\/01\/06\/dimensionality-reduction-pca\/","title":{"rendered":"Dimensionality Reduction – PCA"},"content":{"rendered":"\n
This blog will give you a simple and general insight into one of the dimensionality reduction technique – Principle component analysis (PCA).<\/span> We assume our reader has basic knowledge of PCA and want to understand its principle visually.<\/p>\n\n\n\n Nowadays, millions of data generated every second. There is a common phenomenon in data structure: High dimension. Under such a scenario, explore, visualize and analyse data is hard and time-consuming. Thus, our objective is to find a low-dimensional representation of data that captures as much relevant\/interesting information, which is dimensionality reduction (i.e, reduce 3 dimensions to 2 dimensions as the figure below).<\/p>\n\n\n\n PCA is one of most widely used tools in dimensionality reduction. It could reduce \\( m \\) variables into \\( k \\) principle components ( \\( m>>k \\)), and each component is a linear combination of original variables. That’s why PCA belongs to linear dimensionality reduction. In mathematics, denote \\(X_1,X_2,…,X_d \\) as \\(d\\) dimensional variables in dataset, our aim is to find \\(j\\) components that satisfied: \\(Z_j = \\phi_{j,1}X_1+\\phi_{j,2}X_2+…+\\phi_{j,d}X_d\\).<\/p>\n\n\n\n \\(\\phi_{j,1}\\) describe the weights assigned to original variable \\(X_1\\)\u200b for principle \\(Z_j\\)\u200b. That is how much information of \\(X_1\\)\u200b is assigned to the \\(j\\) principle. This could be used to explain the meaning of each principle.<\/p>\n\n\n\n GIF above shows reducing two dimensions to 1 dimension. We could see that the shape of data is an ellipse. <\/p>\n\n\n\n Currently, we have two dimensions: x-axis and y-axis. Now we are trying to reduce them to 1-dimension which has as much information. This means, instead of the two axises, we will express our data in one new axis.<\/p>\n\n\n\n PCA is easier to use, it has lots of advantages:<\/p>\n\n\n\n However, it still has some disadvantages:<\/p>\n\n\n\n How to implement the PCA in R could read:<\/p>\n\n\n\n
What is Principle Component Analysis?<\/h1>\n\n\n\n
\n\n\n\nHow to understand visually?<\/h1>\n\n\n\n
\n\n\n\n
Extra information<\/h1>\n\n\n\n
\n\n\n\nOther supplement:<\/h1>\n\n\n\n
\n\n\n\n