t-SNE Visualization of 20 News Group dataset

The aim of this visualization is to understand the working of t-SNE. The code for this project could find here.

N-gram:
Perplexity :

Text of the element on hover

Comparison of t-SNE visualizations along with PCA

PCA
t-SNE Perplexity-10
t-SNE Perplexity-30
t-SNE Perplexity-50
t-SNE Perplexity-100
UnigramUnigram Unigram_10 Unigram_30 Unigram_50 Unigram_100
Unigram&
Bigram
Unigram&Bigram Unigram&Bigram_10 Unigram&Bigram_30 Unigram&Bigram_50 Unigram&Bigram_100
Unigram&
Bigram&
Trigram
Unigram&Bigram&Trigram Unigram&Bigram&Trigram_10 Unigram&Bigram&Trigram_30 Unigram&Bigram&Trigram_50 Unigram&Bigram&Trigram_100
BigramBigram Bigram_10 Bigram_30 Bigram_50 Bigram_100
TrigramTrigram Trigram_10 Trigram_30 Trigram_50 Trigram_100

Approach

This scatter plot is generated by parsing the text of 11314 articles from 20 different target groups in the dataset. The articles are then cleaned by lemmatizing, removing of special characters and stopwords. Then each instance is turned into features vector using Sklearn's TfidfVectorizer. Then, dimensionality reduction is applied to the feature vectors using Sklearn's t-SNE. Plotly.js has been used for producing this scatter plot.