ML-Assignment-DNAC1 — ML Pipelines Assignment

Overview

Project Summary

Status: In Progress

This repository contains an assignment for the Machine Learning course. The goal is to build end-to-end pipelines for tabular, text, and image datasets using both traditional ML and deep learning approaches.

Learning Objectives

Design reproducible data preprocessing and feature engineering pipelines.
Train, evaluate, and compare traditional and deep-learning models.
Work with tabular, text, and image datasets.

Pipelines

Architecture & Steps

Common Components

Exploratory Data Analysis (EDA)
Preprocessing & feature engineering
Model training & hyperparameter tuning
Evaluation & visualizations

Per-Modal Pipelines

Tabular

Steps: EDA → imputation → scaling → categorical encoding → feature selection → train/test split → modeling → training → testing

Text

To Be Added.

Image

To Be Added.

Datasets

Tabular: Adult Census Income — Predict whether income exceeds $50K/yr based on census data.
[UCI Dataset Link]

Text: To Be Added.

Images: To Be Added.

Run & Reproduce

How to run

Requirements

Python 3.12+
Jupyter Notebook or Google Colab

Run notebook using Google Colab:

Open the notebook in Google Colab.
Select Runtime → Run all.
The entire process (installing libraries, loading data, training, evaluating) will run automatically.

Run notebook locally:

Create a virtual environment (recommended):

python -m venv venv
source venv/bin/activate   # Linux/Mac
venv\Scripts\activate      # Windows

Install dependencies:
```
pip install -r requirements.txt
```

Run notebook using Jupyter:

cd notebooks
jupyter notebook notebook_name.ipynb

Links

Links to Resources

🔗 View Repository on GitHub

Tabular Dataset Notebooks (Adult Census Income):

Traditional Pipeline:
Deep Learning Pipeline:

Experiment Result

Model Comparison

The following figure summarizes the performance comparison between different models used in our pipeline:

Figure 1: SVM with 90% PCA performs the best in Adult Census Income dataset.

Repository

Files & Structure

ML-Assignment-DNAC1/
├─ docs/            # This Github Page
    └─ index.html   # This file
├─ features/        # Feature extraction files
├─ modules/         # Supporting modules
├─ notebooks/       # Jupyter notebooks for configuration and experimentation
├─ reports/         # Reports, results, and visualizations
├─ requirements.txt # Library requirements
└─ README.md        # This project's README

Author

DNAC1 Team

Team Members

Name	Student ID	Role
Cao Huu Thien Hoang	2311030	Deep Learning Training
Le Tien Dat	2310653	EDA and Feature Engineering
Tran Vinh Dung	2310574	Traditional Training

Quick Notes

This project is still a work in progress.