Evaluating small neural networks for general-purpose lossy data compression
This repository has been archived on 2025-12-23. You can view files and clone it, but you cannot make any changes to it's state, such as pushing and creating new issues, pull requests or comments.
Find a file
2025-12-17 10:39:10 +01:00
config feat: new CNN, start of creating graphs 2025-12-14 18:36:40 +01:00
graphs feat: graph and measurement code for CPU lossless algorithms 2025-12-16 18:54:29 +01:00
models feat: graphs + models + updated finished graph code + data in csv 2025-12-16 10:06:47 +01:00
results feat: graph and measurement code for CPU lossless algorithms 2025-12-16 18:54:29 +01:00
src feat: graphs + models + updated finished graph code + data in csv 2025-12-16 10:06:47 +01:00
.gitignore feat: ignored macos specific files :) 2025-12-17 10:39:10 +01:00
.python-version chore: Change versions, setup HPC 2025-11-30 16:51:44 +01:00
benchmark.py feat: Time+memory tracking 2025-12-07 21:49:45 +01:00
cpu_compression_graphs.py feat: graph and measurement code for CPU lossless algorithms 2025-12-16 18:54:29 +01:00
job.pbs feat: Smaller genome dataset 2025-12-11 13:38:52 +01:00
main.py feat: autoencoder + updated trainers + cleaned up process to allow using autoencoder 2025-12-14 14:37:04 +01:00
make_graphs.py fix: accuracy replaced by MSE loss, updated graphs 2025-12-16 18:12:10 +01:00
measure.py fix: accuracy replaced by MSE loss, updated graphs 2025-12-16 18:12:10 +01:00
measure_gzip_lz4.sh feat: graph and measurement code for CPU lossless algorithms 2025-12-16 18:54:29 +01:00
pyproject.toml Merge branch 'main' into process 2025-12-11 23:16:25 +01:00
README.md chore: Replace firefox with 7zip (smaller) 2025-12-11 22:45:46 +01:00
uv.lock Merge branch 'main' into process 2025-12-11 23:16:25 +01:00

neural compression

Running locally

uv sync --all-extras

Example usage:

# Fetching
python main.py --debug train --method fetch \
  --dataset enwik9 --data-root /path/to/datasets

# Training
python main.py --debug train --method optuna \
  --dataset enwik9 --data-root /path/to/datasets \
  --model cnn --model-save-path /path/to/optuna-model
python main.py --debug --results /path/to/results train --method full \
  --dataset enwik9 --data-root /path/to/datasets \
  --model-load-path /path/to/optuna-model --model-save-path /path/to/full-model

# Compressing
python benchmark.py --debug compress \
  --model-load-path /path/to/full-model \
  --input-file inputfile --output-file outputfile

Testing compression:

bash config/download_datasets.sh config/urls.txt /home/tdpeuter/data/ml-inputs
bash config/generate_csv.sh > config/sub.csv
bash config/local.sh

Running on the Ghent University HPC

See the Infrastructure docs for more information about the clusters.

module swap cluster/joltik # Specify the (GPU) cluster, {joltik,accelgor,litleo}

qsub job.pbs               # Submit job
qstat                      # Check status