Quora Question Pairs
Can you identify question pairs that have the same intent?
Objective:
Given two Quora questions, determine whether they are similar or not using various traditional ML and DL models with an extensive feature engineering.
Dataset:
The dataset is taken from the Quora Question Pairs competition on Kaggle. The dataset contains 404,290 question pairs with a label that denotes whether the two questions are similar or not. The dataset is split into a training set and a test set. The training set contains 363,861 question pairs, while the test set contains 40,429 question pairs.
Approach:
The approach is divided into three parts:
- Linear Models (with Unigrams, Bigrams, and Trigrams)
- Logistic Regression
- Linear SVM
- Tree-based Models
- Decision Trees
- Random Forest
- XGBoost
- Deep Learning Models
- CBOW + MLP
- GloVe + LSTM
- GloVe + BiLSTM
- GloVe + LSTM + Attention
- GloVe + BiLSTM + Attention
- BERT (Best Model)
\(\rightarrow\) More details can be found in the project report as well as the project repo.