Approach

This scatter plot is generated by parsing the text of 11314 articles from 20 different target groups in the dataset. The articles are then cleaned by lemmatizing, removing of special characters and stopwords. Then each instance is turned into features vector using Sklearn's TfidfVectorizer. Then, dimensionality reduction is applied to the feature vectors using Sklearn's t-SNE. Plotly.js has been used for producing this scatter plot.

t-SNE Visualization of 20 News Group dataset

Veeresh Elango

Comparison of t-SNE visualizations along with PCA

Approach