Page MenuHomePhabricator
Paste P86472

Train 'answerdotai/ModernBERT-base' using Trainer on CPU
ActivePublic

Authored by gkyziridis on Dec 9 2025, 1:11 PM.
Referenced Files
F70960728: Train 'answerdotai/ModernBERT-base' using Trainer on CPU
Dec 9 2025, 1:14 PM
F70960710: Train model using Trainer
Dec 9 2025, 1:12 PM
F70960696: Train model using Trainer
Dec 9 2025, 1:11 PM
Subscribers
$ docker run --rm -it --network=host torch_rocm3
Python 3.11.2 (main, Apr 28 2025, 14:11:48) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> from transformers import (
... AutoTokenizer,
... AutoModelForSequenceClassification,
... Trainer,
... TrainingArguments
... )
>>>
>>>
>>> print(torch.__version__)
2.6.0+rocm6.1
>>>
>>>
>>> MODEL = "answerdotai/ModernBERT-base"
>>> DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
>>>
>>> print(f"Device: {DEVICE}")
Device: cpu
>>>
>>>
>>> tokenizer = AutoTokenizer.from_pretrained(MODEL)
tokenizer_config.json: 20.8kB [00:00, 53.1MB/s]
tokenizer.json: 2.13MB [00:00, 115MB/s]
special_tokens_map.json: 100%|███████████████████████████████████████████████████████████████| 694/694 [00:00<00:00, 4.88MB/s]
>>>
>>>
>>> model = AutoModelForSequenceClassification.from_pretrained(MODEL, num_labels=2)
Some weights of ModernBertForSequenceClassification were not initialized from the model checkpoint at answerdotai/ModernBERT-base and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
>>>
>>>
>>> texts = ["hello world", "modernbert test"]
>>> labels = [0, 1]
>>>
>>> batch = tokenizer(
... texts,
... padding=True,
... truncation=True,
... return_tensors="pt"
... )
>>> batch["labels"] = torch.tensor(labels)
>>>
>>>
>>> train_data = []
>>> for i in range(len(labels)):
... train_data.append({
... "input_ids": batch["input_ids"][i],
... "attention_mask": batch["attention_mask"][i],
... "labels": batch["labels"][i]
... })
...
>>>
>>> args = TrainingArguments(
... output_dir="./out",
... per_device_train_batch_size=2,
... max_steps=1, # just 1 training step
... report_to="none",
... )
>>> trainer = Trainer(
... model=model,
... args=args,
... train_dataset=train_data
... )
>>>
>>>
>>> trainer.train()
{'train_runtime': 1.686, 'train_samples_per_second': 1.186, 'train_steps_per_second': 0.593, 'train_loss': 0.6049032807350159, 'epoch': 1.0}
100%|███████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.69s/it]
TrainOutput(global_step=1, training_loss=0.6049032807350159, metrics={'train_runtime': 1.686, 'train_samples_per_second': 1.186, 'train_steps_per_second': 0.593, 'total_flos': 6655426680.0, 'train_loss': 0.6049032807350159, 'epoch': 1.0})
>>>
>>> print("\nRunning prediction...")
Running prediction...
>>>
>>> with torch.no_grad():
... outputs = model(batch["input_ids"].to(DEVICE), attention_mask=batch["attention_mask"].to(DEVICE))
... logits = outputs.logits.cpu()
>>>
>>> all_zero = torch.all(logits == 0)
>>> has_nan = torch.isnan(logits).any()
>>>
>>> print("\n=== RESULTS ===")
>>> print("Logits:")
>>> print(logits)
>>> print("All zeros? :", bool(all_zero))
>>> print("Contains NaNs?", bool(has_nan))
=== RESULTS ===
Logits:
tensor([[-0.2067, -1.2938],
[-0.9808, 0.4442]])
All zeros? : False
Contains NaNs? False

Event Timeline

gkyziridis edited the content of this paste. (Show Details)
gkyziridis changed the title of this paste from Train model using Trainer to Train 'answerdotai/ModernBERT-base' using Trainer on CPU.