tdpeuter/2025ML-project-neural_compression

Archived

Evaluating small neural networks for general-purpose lossy data compression

ai ma ugent

This repository has been archived on 2025-12-23. You can view files and clone it, but you cannot make any changes to it's state, such as pushing and creating new issues, pull requests or comments.

Find a file

Robin Meersman 079e867763 feat: ignored macos specific files :)		2025-12-17 10:39:10 +01:00
config	feat: new CNN, start of creating graphs	2025-12-14 18:36:40 +01:00
graphs	feat: graph and measurement code for CPU lossless algorithms	2025-12-16 18:54:29 +01:00
models	feat: graphs + models + updated finished graph code + data in csv	2025-12-16 10:06:47 +01:00
results	feat: graph and measurement code for CPU lossless algorithms	2025-12-16 18:54:29 +01:00
src	feat: graphs + models + updated finished graph code + data in csv	2025-12-16 10:06:47 +01:00
.gitignore	feat: ignored macos specific files :)	2025-12-17 10:39:10 +01:00
.python-version	chore: Change versions, setup HPC	2025-11-30 16:51:44 +01:00
benchmark.py	feat: Time+memory tracking	2025-12-07 21:49:45 +01:00
cpu_compression_graphs.py	feat: graph and measurement code for CPU lossless algorithms	2025-12-16 18:54:29 +01:00
job.pbs	feat: Smaller genome dataset	2025-12-11 13:38:52 +01:00
main.py	feat: autoencoder + updated trainers + cleaned up process to allow using autoencoder	2025-12-14 14:37:04 +01:00
make_graphs.py	fix: accuracy replaced by MSE loss, updated graphs	2025-12-16 18:12:10 +01:00
measure.py	fix: accuracy replaced by MSE loss, updated graphs	2025-12-16 18:12:10 +01:00
measure_gzip_lz4.sh	feat: graph and measurement code for CPU lossless algorithms	2025-12-16 18:54:29 +01:00
pyproject.toml	Merge branch 'main' into process	2025-12-11 23:16:25 +01:00
README.md	chore: Replace firefox with 7zip (smaller)	2025-12-11 22:45:46 +01:00
uv.lock	Merge branch 'main' into process	2025-12-11 23:16:25 +01:00

README.md

neural compression

Running locally

uv sync --all-extras

Example usage:

# Fetching
python main.py --debug train --method fetch \
  --dataset enwik9 --data-root /path/to/datasets

# Training
python main.py --debug train --method optuna \
  --dataset enwik9 --data-root /path/to/datasets \
  --model cnn --model-save-path /path/to/optuna-model
python main.py --debug --results /path/to/results train --method full \
  --dataset enwik9 --data-root /path/to/datasets \
  --model-load-path /path/to/optuna-model --model-save-path /path/to/full-model

# Compressing
python benchmark.py --debug compress \
  --model-load-path /path/to/full-model \
  --input-file inputfile --output-file outputfile

Testing compression:

bash config/download_datasets.sh config/urls.txt /home/tdpeuter/data/ml-inputs
bash config/generate_csv.sh > config/sub.csv
bash config/local.sh

Running on the Ghent University HPC

See the Infrastructure docs for more information about the clusters.

module swap cluster/joltik # Specify the (GPU) cluster, {joltik,accelgor,litleo}

qsub job.pbs               # Submit job
qstat                      # Check status