Running Models Module 25 5 min

Hugging Face and
Git LFS.

Cloning a 40 GB model. Model cards, safetensors.

Prerequisites·None Modalities
HF + Git LFS hero illustration
§ 01

Core ideas

THE HUB

Hugging Face is GitHub for model weights

Every model repo is a real git repository: code, config, tokenizer files — plus the weights. git clone works, but weights don't live in git history directly: Git LFS stores them as pointers, fetching the multi-GB blobs from object storage on checkout. Cloning a 40 GB model is mostly an LFS download.
SAFETENSORS

safetensors replaced pickle for a reason

PyTorch's original .bin checkpoints are Python pickles — loading one can execute arbitrary code. .safetensors is a flat, zero-copy tensor format: memory-mappable, faster to load, and incapable of running code. Treat .bin from strangers like an unsigned executable.
MODEL CARDS

The README is the contract

A model card states the license (Apache-2.0 vs research-only changes what you can ship), training data provenance, eval scores, and intended use. Production rule: no card, no deploy. The card is also where quantized variants and chat-template details hide.
PRACTICAL

The commands you actually run

hf download org/model (the CLI was renamed from huggingface-cli in 2025) beats raw clone — resumable, parallel, skips git history, and uses the new Xet chunk-dedup storage backend. HF_HOME controls the cache so 40 GB doesn't land on your system drive. Gated models (Llama) need a token with accepted terms. Revisions pin reproducibility: --revision main today is not --revision main next month.
HF + Git LFS spotlight illustration
§ 02

The lesson

Cloning 40GB neural networks locally from the absolute center of open-source AI.

The GitHub of Machine Learning

Hugging Face is the central nervous system for open-source intelligence. It hosts hundreds of thousands of base models, fine-tunes, datasets, and LoRA adapters. Almost every researcher and engineer relies entirely on this public registry. 🤗 meta-llama / Meta-Llama-3-8B ♡ 12.4k Model Card: Llama 3 is an auto-regressive language model that uses an optimized transformer architecture. GGUF PyTorch Safetensors

Git Large File Storage (LFS)

Because models are fundamentally just massive matrices of floating-point numbers stored in generic `.bin` or `.safetensors` files, they weigh dozens of gigabytes. Git itself handles big files fine — the problem is that storing multi-gigabyte binaries in version history is unworkable: every revision of every weight file would live in the repo forever. `Git LFS` (Large File Storage) solves this by committing tiny pointer files to git and fetching the actual blobs from object storage on demand. # 1. Install Git LFS specifically
$ git lfs install
> Git LFS initialized.

# 2. Clone the massive repository
$ git clone https://huggingface.co/meta-llama/Meta-Llama-3-8B
> Cloning into 'Meta-Llama-3-8B'...
> remote: Enumerating objects: 45, done.
> Filtering content: 100% (4/4), 14.8 GB | 45.1 MB/s, done.

# 3. Model is now physically localized onto your drive
$ ls -lh
> -rw-r--r-- 1 user 4.0G model-00001-of-00004.safetensors
> -rw-r--r-- 1 user 4.0G model-00002-of-00004.safetensors
> -rw-r--r-- 1 user 3.9G model-00003-of-00004.safetensors
> -rw-r--r-- 1 user 2.9G model-00004-of-00004.safetensors
2026

What changed since the LFS era

Across 2025 Hugging Face migrated Hub storage to Xet, which deduplicates at the chunk level — re-uploading a fine-tune only transfers the bytes that actually changed; LFS is now the legacy path. The CLI was renamed too: `huggingface-cli` became `hf`, so downloads are now `hf download org/model`. And the #1 practical gotcha when running a downloaded model: the chat template lives in `tokenizer_config.json` — feed the model raw text without applying it and quality silently collapses.
§ 03

Further reading.

Done with HF + Git LFS?
Mark it complete — progress is saved in your browser and shows on the course map.