t-SNE Visualization of 20 News Group dataset

The aim of this visualization is to understand the working of t-SNE. The code for this project could find here.

N-gram:
Perplexity :

Text of the element on hover

Comparison of t-SNE visualizations along with PCA

Approach

This scatter plot is generated by parsing the text of 11314 articles from 20 different target groups in the dataset. The articles are then cleaned by lemmatizing, removing of special characters and stopwords. Then each instance is turned into features vector using Sklearn's TfidfVectorizer. Then, dimensionality reduction is applied to the feature vectors using Sklearn's t-SNE. Plotly.js has been used for producing this scatter plot.