Dimensionality Reduction In Python

What is Dimensionality Reduction?

Dimensionality Reduction Is Process Of Reducing correlated and independent variable in dataset. this process use in machine learning problems, Get better predictor variable In regression and classification Problem. in data science project lot of column(Feature) in dataset, it is difficult to visualise which feature important in training dataset At independent variable choice( Feature processing) Time. Some Of Algorithm not get better accuracy or result in large dimensions data. Large data (dimensions) need a huge storage as well as computation time. dimensionality reduction have two components. Feature selection, Feature extraction,. Feature selection is search subset in data set and give a independent variable for use in model building time. Feature extraction remove high dimensions and transform in to low dimension. Example Of Feature extraction is LDA, PCA.

PCA (Principal Component Analysis)

PCA is a method for transforming features in a dataset by combining them into uncorrelated linear combinations. These new features, or principal components, sequentially maximize the variance represented (i.e. the first principal component has the most variance, the second principal component has the second most, and so on). As a result, PCA is useful for dimensionality reduction because you can set an arbitrary variance cutoff.

In Simple Word, Large linear separable unsupervised Data set extracting new linear separable data set using PCA technique call as new data set as a Principal Components. do analysis on new dataset call as a Principal Component Analysis. PCA is search linear correlations between features, which is sometimes undesirable and some amount of data loss.

#from sklearn.decomposition import PCA
pca = PCA(n_components = 6)
pca_values = pca.fit_transform(wine_norm)
# The amount of variance that each PCA explains is 
var = pca.explained_variance_ratio_
var
pca.components_[0]

Full Code: Click Hear

Kernel PCA

Kernel PCA is a method or process for transforming features On a higher dimensional inseparable linearly (non-linear) dataset which is linearly separable become using Kernel.

Kernel PCA
# apply Kernal PCA
from sklearn.decomposition import KernelPCA 
kpca = KernelPCA(kernel ='rbf', gamma = 15) 
X_kpca = kpca.fit_transform(X) 
plt.title("Kernel PCA") 
plt.scatter(X_kpca[:, 0], X_kpca[:, 1], c = y) 
plt.show() 

Full Code: Click Hear

LDA (Linear discriminant analysis)

LDA is a dimensionality reduction algorithm use in supervised classification projects. discriminant analysis has categorical predict variable and continuous predictor variables. Most of LDA use is preprocessing and pattern classification Problem. LDA Also Known As (DFA) discriminant function analysis, (NDA) normal discriminant analysis 

Simple Word: Linear discriminant analysis (LDA) Is reduction techniques apply on reduce High variables dataset for contain as much data as possible. Example: Multi dimensional class with multiple features which is correlated each another. LDA plotting multi (Class) dimensional data in just 1 Or More Than 1 dimensions Depended On The Dimensions. histograms, scatter plots and box plots using reducing. use for search pattern on dataset after LDA used.

LDA Extensions:

  1. Quadratic Discriminant Analysis (QDA) : single input variable – estimate variance and multiple independent variables- covariance.
  2. Flexible Discriminant Analysis (FDA): non linear input
  3. Regularized Discriminant Analysis (RDA): combination of both QDA And RDA
#import  LDA Model from LinearDiscriminantAnalysis
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
lda = LDA(n_components=1)
X_train = lda.fit_transform(X_train, y_train)
X_test = lda.transform(X_test)

Full Code: Click Hear

Final OutCome

Finally Dimensionality Reduction is used data compression, Multicollinearity and Low Variance that time ignoring redundant features and decrease computation time But Some data loss. If linearly data set then use PCA And Kernel PCA both are unsupervised algorithm. If Data linearly but not inseparable or multivariate when use only Kernel PCA. LDA is Supervised Learning algorithm use on Multi dimensional data which have categorical depened variable.

Leave a Comment