Project Summary
This repository contains an assignment for the Machine Learning course. The goal is to build end-to-end pipelines for tabular, text, and image datasets using both traditional ML and deep learning approaches.
Learning Objectives
- Design reproducible data preprocessing and feature engineering pipelines.
- Train, evaluate, and compare traditional and deep-learning models.
- Work with tabular, text, and image datasets.
Architecture & Steps
Common Components
- Exploratory Data Analysis (EDA)
- Preprocessing & feature engineering
- Model training & hyperparameter tuning
- Evaluation & visualizations
Per-Modal Pipelines
Tabular
Steps: EDA → imputation → scaling → categorical encoding → feature selection → train/test split → modeling → training → testing
Text
To Be Added.
Image
To Be Added.
Datasets
Tabular: Adult Census Income — Predict whether income exceeds $50K/yr based on census data.
[UCI Dataset Link]
Text: To Be Added.
Images: To Be Added.
How to run
Requirements
- Python 3.12+
- Jupyter Notebook or Google Colab
Run notebook using Google Colab:
- Open the notebook in Google Colab.
- Select Runtime → Run all.
- The entire process (installing libraries, loading data, training, evaluating) will run automatically.
Run notebook locally:
- Create a virtual environment (recommended):
python -m venv venv source venv/bin/activate # Linux/Mac venv\Scripts\activate # Windows
- Install dependencies:
pip install -r requirements.txt
- Run notebook using Jupyter:
cd notebooks jupyter notebook notebook_name.ipynb
Links to Resources
Tabular Dataset Notebooks (Adult Census Income):
Model Comparison
The following figure summarizes the performance comparison between different models used in our pipeline:

Figure 1: SVM with 90% PCA performs the best in Adult Census Income dataset.