Pearl Mainnet — May 2026 — Community Guide

GPU Mining Pearl Network

Plug-and-play guide for mining PRL on 2x H200 SXM.
Built from real production runs. Works with any AI assistant for a step-by-step walkthrough.

🤖
AI-Powered Setup Feed this guide to Claude, ChatGPT or any LLM for live assistance
2x H200 SXM CUDA 12.8 vLLM 0.20.0 Llama 3.3 70B DP=2 Mode RunPod Ubuntu 22.04 LLM-Friendly
2×H200Hardware
~690WFull Load
m=8174Peak GEMM
2884 PRLBlock Reward
96–97%GPU Util
4 socksGateway
AI-Powered Setup

🤖 Use this guide with any AI assistant

This guide is structured to work as a plug-and-play resource for LLMs. Upload the HTML file to Claude, ChatGPT, Gemini or any AI assistant and ask it to walk you through setup step by step — one command at a time, verifying each output before moving on. No prior Linux or mining experience needed.

Step 01
Download the guide
Click "Download HTML" to save the guide file locally
Step 02
Open your AI assistant
Claude, ChatGPT, Gemini — any LLM that accepts file uploads
Step 03
Upload the guide + prompt
Say: "Follow this guide to help me set up a Pearl miner. One command at a time, verify each output."
Step 04
Follow along
Paste each command output back to the AI — it verifies and moves on
Download HTML File
⚡ Tip — Expose Port 44108 Before Deploying Pearl's P2P port is 44108. Adding it in RunPod's TCP port exposures field before deploying gives you more peers and faster block propagation. Takes 5 seconds. If you forgot — don't restart for it, just add it next time. 16 peers works fine for mining.
Mining Guide

Pearl Miner Setup Guide

Complete installation, configuration, health monitoring & troubleshooting

RunPod 2x H200 CUDA 12.8+ DP=2 Mode Llama 70B

Pearl Miner Setup Guide

Complete installation, configuration, health monitoring & troubleshooting

RunPod / Vast.ai / Lambda CUDA 12.4+ H100 / H200 Required DP=2 Mode Llama 70B Ubuntu 22.04

Contents

Hardware Requirements & Pod Verification (READ FIRST) 00a Cloud Provider Paths & Compatibility 00 Quick Reference (Key Settings) 01 System Dependencies 02 Clone & Build 02b Set Environment Variables (Persistent) 03 Wallet & Node Setup 04 Start Mining 04d Apply NOISY_GEMM Debug Patch — Mandatory 05 Health Check & Diagnostics 06 Troubleshooting 07 Block Verification 08 Quick Restart Reference 08b Additional Critical Notes 09 Key Lessons Learned 10 Important Gotchas & Edge Cases

Hardware Requirements & Pod Verification

🚨 This guide was built and tested exclusively on RunPod 2x H200 SXM. Every command, expected output, VRAM value, power draw figure, and health check threshold in this guide is calibrated for that specific setup. If you use different hardware, commands will still work but expected output values will differ.

What This Guide Is Designed For

ComponentThis Guide's SetupNotes
GPU2x NVIDIA H200 SXMConfirmed working. All expected values in this guide are for H200.
VRAM per GPU143,771 MiB (~141GB)After model load: ~132,964 MiB GPU0 / ~131,399 MiB GPU1
CUDA Version12.8Minimum: 12.4. Tested on 12.8.
Driver570.211.01Any 520+ should work
GPU CountExactly 2Guide uses --data-parallel-size 2
System RAM64GB+Needed for build + model loading
Disk300GB+ persistentModel ~140GB + builds ~50GB + chain ~5GB
OSUbuntu 22.04RunPod default image
ProviderRunPodSee Section 00a for other providers
GPU power at full mining~690W each (near 700W TDP)This is the health indicator — if power is 120W, mining is not happening

Other Hardware — Community Reports (Not Tested by This Guide)

⚠️ The following is based on Pearl Discord community reports — NOT verified by this guide. If you use different hardware, expected output values will differ from what this guide shows. Proceed with caution and adapt health check thresholds accordingly.
GPUCommunity StatusNotes
H200 SXM ×2✅ This guide — confirmedReference setup for this guide
H100 SXM ×2✅ Community confirmedWorks. 80GB VRAM each. Adjust VRAM expectations in health checks.
H100 NVL ×1 + H200 ×1⚠️ Community reportedMixed setup. Some users got blocks.
Single H200⚠️ PossibleUse --data-parallel-size 1, 64 requests. Lower hashrate.
A100 ×2❌ Not recommendedAmpere architecture — Pearl kernel targets Hopper. May not compile.
RTX 4090 ×2❌ Insufficient VRAM24GB each = 48GB total. Not enough for 70B model.

Step 0 — Verify Your Pod Before Starting

Run these immediately after SSH-ing in. If any check fails, reprovision before continuing.

Check 1 — GPU model, CUDA, VRAM
nvidia-smi
✅ Good (H200)
2x H200, CUDA 12.8, Driver 570+, 143771 MiB each, 0 MiB used
❌ Bad
Wrong GPU, CUDA <12.4, only 1 GPU, or VRAM already used → reprovision
Check 2 — Disk space (need 300GB+ free)
df -h | sort -rh | head -8
✅ Good
300GB+ available on at least one partition
❌ Bad
Less than 300GB → expand disk before proceeding
Check 3 — RAM
free -h
✅ Good
64GB+ total RAM
❌ Bad
Under 64GB → may OOM during build or model load
Check 4 — OS
lsb_release -a 2>/dev/null || cat /etc/os-release | head -5
✅ Good
Ubuntu 22.04 LTS (Jammy)
⚠️ Untested
Ubuntu 20.04 — may work but not verified by this guide
Check 5 — Internet
curl -s --max-time 5 https://github.com > /dev/null && echo "GitHub OK" && curl -s --max-time 5 https://huggingface.co > /dev/null && echo "HuggingFace OK"
✅ Good
GitHub OK / HuggingFace OK
❌ Bad
Blocked → check provider firewall / outbound rules

Step 0b — Expose Port 44108 Before Deploying (Do This First!)

⚠️ Pearl's P2P port is 44108. If you expose it before deploying your pod, other nodes on the network can connect TO you (inbound connections), giving you more peers and faster block propagation. If you don't expose it, you'll be limited to ~16 outbound-only peers — which still works fine for mining but is not optimal.
ℹ️ On RunPod: before clicking Deploy, find the TCP Port Exposures field and add port 44108. This takes 5 seconds and costs nothing. If your pod is already running, it requires a full restart to add — only worth it at natural restart time.
ScenarioPeersImpact
Port 44108 NOT exposed (RunPod default)~16 outbound onlyWorks fine. Block propagation slightly slower.
Port 44108 exposedUp to 200 inbound+outboundBetter connectivity, faster block propagation.
Discord reports of 200+ peers200+These users have inbound port exposed AND are on providers with open firewall.
✅ If you already deployed without exposing port 44108 — don't restart just for this. Wait until next natural restart and add it then. 16 peers does not meaningfully affect your mining rewards.
Full pre-flight one-liner
echo "=== GPU ===" && nvidia-smi --query-gpu=name,memory.total,driver_version --format=csv,noheader && echo "=== DISK ===" && df -h | sort -rh | head -5 && echo "=== RAM ===" && free -h | grep Mem && echo "=== OS ===" && lsb_release -d 2>/dev/null && echo "=== NETWORK ===" && curl -s --max-time 5 https://github.com > /dev/null && echo "GitHub OK" || echo "GitHub BLOCKED"
✅ Only proceed to Step 01 if: 2x H200 (or compatible GPU), CUDA 12.4+, 300GB+ disk, 64GB+ RAM, Ubuntu 22.04, GitHub reachable.
01 System Dependencies 02 Clone & Build 02b Set Environment Variables (Persistent) 03 Wallet & Node Setup 04 Start Mining 05 Health Check & Diagnostics 06 Troubleshooting 07 Block Verification 08 Quick Restart Reference 08b Additional Critical Notes (Watchdog, Debug, Gotchas) 09 Key Lessons Learned 10 Important Gotchas & Edge Cases

00a Cloud Provider Paths & Compatibility

This guide was built and tested on RunPod. The core setup is identical across providers — only storage paths and a few installation details differ. Use this table to adapt the guide for your provider.

ProviderHF_HOME pathUV Cache pathPersistent storageNotes
RunPod ✅ Tested /workspace/.hf /workspace/.uv-cache /workspace Deadsnakes PPA blocked — use UV for Python 3.12. Ubuntu 22.04.
Vast.ai /root/.cache/huggingface or /workspace/.hf /root/.cache/uv /workspace (if attached) Use Custom Template. Ubuntu 22.04 works. apt python3.12 may work via deadsnakes.
Lambda Labs /home/ubuntu/.cache/huggingface /home/ubuntu/.cache/uv /home/ubuntu Ubuntu 22.04. Python 3.12 via deadsnakes should work. Run as ubuntu not root.
CoreWeave /mnt/data/.hf /mnt/data/.uv-cache /mnt/data Kubernetes-based. Persistent volume must be mounted manually.
Paperspace /notebooks/.hf /notebooks/.uv-cache /notebooks Ubuntu 20.04/22.04. Python 3.12 via deadsnakes.
Any provider Any path with 200GB+ free space Any writable path Check df -h for largest partition Find largest partition: df -h | sort -rh | head -5

How to Adapt This Guide for Any Provider

Replace every occurrence of /workspace/.hf with your provider's persistent storage path, and /workspace/.uv-cache with the UV cache path. The two places these appear are:

1. In ~/.bashrc
export HF_HOME=/YOUR_PROVIDER_PATH/.hf
2. In the build:miner command (Step 2)
cd /root/pearl && export UV_CACHE_DIR=/YOUR_PROVIDER_PATH/.uv-cache && export HF_HOME=/YOUR_PROVIDER_PATH/.hf && task build:miner

Python 3.12 Installation by Provider

ProviderPython 3.12 MethodCommand
RunPod apt blocked — use UV uv python install 3.12
Vast.ai Try apt first, fallback to UV apt-get install -y python3.12 || uv python install 3.12
Lambda / Paperspace apt via deadsnakes PPA add-apt-repository ppa:deadsnakes/ppa && apt-get install -y python3.12
Any provider (universal) UV always works uv python install 3.12
ℹ️ UV-based Python install (Step 1) always works regardless of provider — it downloads a standalone CPython binary. Use it as the universal fallback if apt fails.

Pre-flight Check (run on any fresh pod)

Verify GPU + CUDA before starting
nvidia-smi && echo "CUDA OK" || echo "NO GPU DETECTED"
✅ Good
Shows H100/H200, CUDA 12.x, Driver 520+
❌ Bad
No GPU detected → wrong instance type, reprovision
Find largest storage partition (for HF_HOME)
df -h | sort -rh | head -5

Pick the partition with 300GB+ free space for HF_HOME. The 70B model needs ~140GB.

00 Quick Reference

SettingValueWhy
Parallelism--data-parallel-size 2NOT tensor parallel — TP reduces m dimension
Prefix Caching--no-enable-prefix-cachingMUST disable — caching = no GEMM = no mining
Chunked Prefill--no-enable-chunked-prefillMust disable for correct mining behavior
GPU Memory--gpu-memory-utilization 0.9Leave 10% headroom
Model Length--max-model-len 8192Fits in 80GB VRAM
Execution--enforce-eagerRequired for Pearl kernel
ZK Speedexport RAYON_NUM_THREADS=96Faster proof generation
Deep GEMMexport VLLM_USE_DEEP_GEMM=0Disable — conflicts with Pearl GEMM
Requests128 concurrent long-prompt requestsLong prompts (~150+ tokens) needed for m≥5000
Loop patternsleep 1 (NOT wait)wait causes GPU to idle between batches → 0% util
Request portport 8000 ONLYDP=2 exposes single port — port 8001 drops silently
Socket Count4 ESTAB connections2 per DP engine = 4 total when healthy
n value in NOISY_GEMM57344Confirms DP mode (TP gives 28672)
Node RPCport 44107 (pearld)pearl daemon
Wallet RPCport 44207 (oyster)wallet daemon

01 System Dependencies

ℹ️ Run each block separately. Verify output before moving to next step.

Go Language

Run
wget -q https://go.dev/dl/go1.24.2.linux-amd64.tar.gz && tar -C /usr/local -xzf go1.24.2.linux-amd64.tar.gz && export PATH=$PATH:/usr/local/go/bin && echo 'export PATH=$PATH:/usr/local/go/bin' >> ~/.bashrc
Verify
go version
✅ Good
go version go1.24.2 linux/amd64
❌ Bad
command not found → re-run wget/tar command

Rust Toolchain

Run
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y && source ~/.cargo/env
Verify
rustc --version
✅ Good
rustc 1.xx.x (xxxxxxx YYYY-MM-DD)
❌ Bad
command not found → source ~/.cargo/env

UV Package Manager

Run
curl -LsSf https://astral.sh/uv/install.sh | sh && source $HOME/.local/bin/env
✅ Good
uv 0.x.x
❌ Bad
command not found → run: source $HOME/.local/bin/env

Taskfile

Run
sh -c "$(curl --location https://taskfile.dev/install.sh)" -- -d -b /usr/local/bin
✅ Good
Task version: vx.x.x
❌ Bad
permission denied → check /usr/local/bin permissions

tmux

Run
apt-get update -qq && apt-get install -y tmux

Python 3.12

⚠️ On RunPod, the deadsnakes PPA is blocked — apt-get install python3.12 will fail with "Unable to locate package". Use UV to install Python 3.12 instead (UV is already installed above).
Install Python 3.12 via UV
uv python install 3.12
Make it the system default
ln -sf $(uv python find 3.12) /usr/local/bin/python3.12 && update-alternatives --install /usr/bin/python3 python3 /usr/local/bin/python3.12 1 && python3 --version
✅ Good
Python 3.12.x
❌ Bad
command not found → re-run uv python install 3.12

02 Clone & Build

Clone Repository

Run
cd /root && git clone https://github.com/pearl-research-labs/pearl.git && cd pearl
✅ Good
Cloning into 'pearl'... done. You are now in /root/pearl
❌ Bad
fatal: repository not found → check internet
ℹ️ All build commands must run from /root/pearl directory. Verify with: pwd → should show /root/pearl

Build Blockchain

Run (from /root/pearl)
cd /root/pearl && task build:blockchain
✅ Good
Build completes without errors
❌ Bad
go: command not found → export PATH=$PATH:/usr/local/go/bin
Verify binaries exist
ls -la /root/pearl/bin/pearld /root/pearl/bin/oyster /root/pearl/bin/prlctl
✅ Good
All 3 files listed with size >0
❌ Bad
No such file → build failed, check task output for errors

Build Miner (~20-25 minutes)

⚠️ This takes 20-25 minutes. Do NOT interrupt it! First run compiles CUDA kernels.
Run (from /root/pearl)
cd /root/pearl && export UV_CACHE_DIR=/workspace/.uv-cache && export HF_HOME=/workspace/.hf && task build:miner
✅ Good
Installed 265 packages — vllm==0.20.0+cu129 in list
❌ Bad
CUDA build failed → check nvidia-smi shows H100/H200
Verify venv exists
ls /root/pearl/.venv/bin/vllm && ls /root/pearl/.venv/bin/pearl-gateway
✅ Good
Both files listed — build successful
❌ Bad
No such file → miner build failed, re-run task build:miner

02b Set Environment Variables (Persistent)

🚨 CRITICAL: Do this BEFORE starting the miner. Env vars must be in ~/.bashrc so they survive across shell sessions and tmux windows. If you only export them inline, the gateway will fail with "mining_address: Field required" because the vars don't reach the tmux session.

Add all required env vars to ~/.bashrc now (you will update PEARLD_MINING_ADDRESS after Step 3):

Add to ~/.bashrc
cat >> ~/.bashrc << 'EOF' export PEARLD_RPC_URL=http://localhost:44107 export PEARLD_RPC_USER=rpcuser export PEARLD_RPC_PASSWORD=rpcpass export PEARLD_MINING_ADDRESS=PLACEHOLDER export HF_HOME=/workspace/.hf export VLLM_USE_DEEP_GEMM=0 export RAYON_NUM_THREADS=96 EOF source ~/.bashrc && echo $VLLM_USE_DEEP_GEMM
✅ Good
Prints: 0
❌ Bad
Empty output → vars not set, re-run the cat command
⚠️ After generating your mining address in Step 3, update ~/.bashrc: replace PLACEHOLDER with your real address, then run source ~/.bashrc before starting the miner in Step 4.

03 Wallet & Node Setup

Create Wallet

Run
cd /root/pearl && ./bin/oyster --create
🚨 CRITICAL: Write down the 12-word seed phrase! This is your ONLY backup. If you lose it you lose all mined PRL forever.

When prompted, answer as follows:

PromptAnswer
Do you want to add a passphrase?No (just press Enter) — or set one you'll remember
Do you have an existing seed phrase?No
Seed phrase shown⚠️ WRITE IT DOWN NOW — all 12 words in order
Type OK to confirmOK

Start tmux Sessions

Run
tmux new-session -d -s node && tmux new-session -d -s miner && tmux new-session -d -s loop
✅ Good
tmux ls shows: node, miner, loop sessions
❌ Bad
session exists → tmux kill-session -t node first

Start Blockchain Node

Run
tmux send-keys -t node "cd /root/pearl && ./bin/pearld --rpcuser=rpcuser --rpcpass=rpcpass --rpclisten=0.0.0.0:44107 --txindex --notls --maxpeers=200" Enter

Wait 30 seconds then verify:

Verify
cd /root/pearl && ./bin/prlctl -u rpcuser -P rpcpass -s localhost:44107 --notls getblockcount
✅ Good
Returns a block number (e.g., 36000+)
❌ Bad
connection refused → node not started, check tmux node session

Get Mining Address

Run
/root/pearl/bin/oyster -u rpcuser -P pearl123 --noclienttls --noservertls --pearldusername=rpcuser --pearldpassword=rpcpass > /tmp/oyster.log 2>&1 & sleep 15 && /root/pearl/bin/prlctl -u rpcuser -P pearl123 -s localhost:44207 --wallet --notls getnewaddress
✅ Good
Returns address starting with prl1p...
❌ Bad
connection refused → oyster not ready, wait longer and retry
⚠️ SAVE THIS ADDRESS! You'll need it in every restart command. Also verify it with validateaddress below.
🚨 Now update your ~/.bashrc with the real address: sed -i 's/PEARLD_MINING_ADDRESS=PLACEHOLDER/PEARLD_MINING_ADDRESS=YOUR_ACTUAL_ADDRESS/' ~/.bashrc && source ~/.bashrc && echo $PEARLD_MINING_ADDRESS — confirm it prints your address before proceeding.

Verify Address is Yours

Run (replace YOUR_ADDRESS)
/root/pearl/bin/prlctl -u rpcuser -P pearl123 -s localhost:44207 --wallet --notls validateaddress YOUR_ADDRESS
✅ Good
"ismine": true
❌ Bad
"ismine": false → wrong address, generate a new one

04 Start Mining

Start Gateway + vLLM (replace YOUR_MINING_ADDRESS)

🚨 Gateway and vLLM MUST start in the same tmux session! If separate, they won't connect. Also — delete any stale socket first: rm -f /tmp/pearlgw.sock
🚨 Use FULL PATHS to vllm and pearl-gateway — do NOT rely on venv activate inside tmux. The activate command often fails silently in tmux send-keys, causing "vllm: command not found".
Run
rm -f /tmp/pearlgw.sock && tmux kill-session -t miner 2>/dev/null; tmux new-session -d -s miner && tmux send-keys -t miner "cd /root/pearl && source ~/.bashrc && /root/pearl/.venv/bin/pearl-gateway start > /tmp/gateway.log 2>&1 & sleep 10 && /root/pearl/.venv/bin/vllm serve pearl-ai/Llama-3.3-70B-Instruct-pearl --host 0.0.0.0 --port 8000 --max-model-len 8192 --gpu-memory-utilization 0.9 --enforce-eager --data-parallel-size 2 --no-enable-prefix-caching --no-enable-chunked-prefill" Enter
ℹ️ Gateway logs go to /tmp/gateway.log — this keeps the miner tmux session clean so vLLM output is visible. Check gateway: tail -5 /tmp/gateway.log
⚠️ vLLM takes 10-15 minutes to load the 70B model on first run (~140GB download). Subsequent runs use cached model from /workspace/.hf and load in ~2-3 minutes.

Wait for Node to Sync Before vLLM Starts

🚨 If the node is still syncing when vLLM starts, it will crash with "mining_paused: no block template available". The node must be fully synced first. Check sync status:
Check sync status
cd /root/pearl && ./bin/prlctl -u rpcuser -P rpcpass -s localhost:44107 --notls getblockchaininfo 2>/dev/null | grep -E "blocks|headers"
✅ Synced
blocks == headers (same number)
❌ Syncing
headers > blocks → wait, re-check every 30 seconds

Verify vLLM Loaded

Check GPU Memory
nvidia-smi --query-gpu=index,memory.used --format=csv,noheader
✅ Good
0, 132964 MiB / 1, 131397 MiB
❌ Bad
0, 4 MiB / 1, 4 MiB → still loading, wait
Check Health
curl -s http://localhost:8000/health && echo "READY" || echo "NOT READY"
✅ Good
READY
❌ Bad
NOT READY → still loading, wait and retry

Start Request Loop

🚨 Prompts MUST be randomized! Same prompts = KV caching = ZERO MINING!

Step 4b — Start the Request Worker (Python — Recommended)

✅ The Python worker is the recommended approach. It uses proper threading — no curl job accumulation, stable m values, no fork errors. The bash loop (Step 4c) is the fallback if Python worker has issues.

Create the Python worker script:

Create pearl_worker.py
python3 << 'PYEOF' code = '''#!/usr/bin/env python3 import threading, random, requests, time, sys, signal VLLM_URL = "http://localhost:8000/v1/chat/completions" MODEL = "pearl-ai/Llama-3.3-70B-Instruct-pearl" NUM_WORKERS = 32 MAX_TOKENS = 3 WORD_LIST_LENGTH = 120 REQUEST_TIMEOUT = 120 CONSONANTS = "bcdfghjklmnpqrstvwxyz" VOWELS = "aeiou" def random_word(length=None): if length is None: length = random.randint(4, 10) return "".join(random.choice(CONSONANTS if i%2==0 else VOWELS) for i in range(length)) def build_prompt(): bypass = random_word(random.randint(5, 12)) words = " ".join(random_word() for _ in range(WORD_LIST_LENGTH)) return bypass + ", decipher this secret message: " + words class MiningWorker(threading.Thread): def __init__(self, wid): super().__init__(daemon=True) self.wid = wid self.count = 0 self.running = True def run(self): print(f"[W{self.wid}] Started", flush=True) while self.running: try: r = requests.post(VLLM_URL, json={"model": MODEL, "messages": [{"role": "user", "content": build_prompt()}], "max_tokens": MAX_TOKENS}, timeout=REQUEST_TIMEOUT) if r.status_code == 200: self.count += 1 if self.count % 10 == 0: out = r.json().get("choices",[{}])[0].get("message",{}).get("content","") print(f"[W{self.wid}] req={self.count} out=\'{out.strip()}\'", flush=True) else: time.sleep(1) except requests.exceptions.Timeout: print(f"[W{self.wid}] Timeout, retrying...", flush=True) time.sleep(2) except requests.exceptions.ConnectionError: print(f"[W{self.wid}] ConnError, retrying in 5s...", flush=True) time.sleep(5) except Exception as e: print(f"[W{self.wid}] Error: {e}", flush=True) time.sleep(2) def stats(workers): while True: time.sleep(30) total = sum(w.count for w in workers) print(f"[Stats] total={total} | " + " ".join(f"W{w.wid}:{w.count}" for w in workers), flush=True) def main(): print(f"Pearl Worker -- {NUM_WORKERS} workers, max_tokens={MAX_TOKENS}", flush=True) workers = [MiningWorker(i) for i in range(NUM_WORKERS)] def shutdown(s,f): for w in workers: w.running = False sys.exit(0) signal.signal(signal.SIGINT, shutdown) signal.signal(signal.SIGTERM, shutdown) for w in workers: w.start() threading.Thread(target=stats, args=(workers,), daemon=True).start() while True: time.sleep(1) if __name__ == "__main__": main() ''' with open("/root/pearl/pearl_worker.py", "w") as f: f.write(code) print("Written OK") PYEOF python3 -c "import ast; ast.parse(open('/root/pearl/pearl_worker.py').read()); print('Syntax OK')"

Start the worker in the worker tmux session:

Start worker session and launch
tmux new-session -d -s worker && tmux send-keys -t worker "cd /root/pearl && /root/pearl/.venv/bin/python pearl_worker.py" Enter && echo "✓ Worker started"

Verify after 30 seconds:

Verify worker running
sleep 30 && nvidia-smi --query-gpu=index,utilization.gpu,power.draw --format=csv,noheader && tmux capture-pane -t worker -p -S -5 | tail -5
✅ Good
GPU 90%+, worker showing req counts and outputs like "To decipher this"
❌ Bad
Workers timing out → vLLM not ready, wait longer or restart miner
ℹ️ Why 32 workers? Each worker sends one request at a time. 32 concurrent requests keeps vLLM's batch size large enough for m=5000-8174. Too few workers (3) = m=1024 = barely mining. Too many (128+) = vLLM crashes.
ℹ️ Prompt format based on dev team recommendation: {random_word}, decipher this secret message: {120 random words} — first random word bypasses prefix caching, long word list fills the prefill matrix for maximum GEMM size.

Step 4c — Bash Loop (Fallback Only)

⚠️ Use the bash loop only if the Python worker fails. The bash loop accumulates thousands of curl jobs over time (10,000+) causing fork errors and degraded m values. The Python worker is always preferred.
🚨 Use LONG prompts (~150+ tokens)! Short prompts produce small m values (m<1024) which fail the should_use_noisy_gemm() threshold check = NO MINING. Long prefill-heavy prompts achieve m=5000-8000+ for maximum hash rate.
⚠️ Send ALL requests to port 8000 ONLY. With DP=2, vLLM exposes a single port (8000). Port 8001 does NOT exist — requests there are dropped silently.
Run
tmux send-keys -t loop "COUNT=0; while true; do COUNT=\$((COUNT+1)); for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128; do curl -s http://localhost:8000/v1/chat/completions -H 'Content-Type: application/json' -d '{\"model\": \"pearl-ai/Llama-3.3-70B-Instruct-pearl\", \"messages\": [{\"role\": \"user\", \"content\": \"Write a detailed comprehensive academic essay about topic \$COUNT variant \$i covering the following aspects in depth: historical background and origins dating back centuries, mathematical foundations and theoretical frameworks, scientific principles and empirical evidence, technological applications and modern implementations, economic implications and market dynamics, social and cultural impacts on society, philosophical interpretations and ethical considerations, future prospects and emerging research directions, comparative analysis with related fields, and practical case studies with real world examples.\"}], \"max_tokens\": 1}' > /dev/null & done; sleep 1; done" Enter

Verify Loop + Mining is Working (wait 2 minutes then run)

Check GPU utilization went up
nvidia-smi --query-gpu=index,utilization.gpu --format=csv,noheader
✅ Good
Both GPUs at 90%+
❌ Bad
0% → loop not sending requests, check tmux loop session
Confirm NOISY_GEMM is firing
tmux capture-pane -t miner -p -S -5000 | grep "NOISY_GEMM" | tail -3
✅ Mining!
NOISY_GEMM_CALLED: m=5000+ n=57344 k=8192 on BOTH workers
❌ Not Mining
No output → use -S -5000 (larger buffer), or apply NOISY_GEMM debug patch from Section 08b
⚠️ NOISY_GEMM output goes to the tmux buffer. Always use -S -5000 (not -S -50) to look back far enough — the buffer fills with other logs quickly.

Verify with Metrics Endpoint

The vLLM metrics endpoint is the most reliable way to confirm everything is working correctly:

Check requests running + cache hits
curl -s http://localhost:8000/metrics | grep -E "num_requests_running|cache_hit" | grep -v "^#\|reason\|external\|mm_cache"
✅ Healthy
num_requests_running engine=0: 30-50, engine=1: 30-50 | cache_hit: 0.0
❌ Problem
num_requests_running: 0.0 → loop not sending | cache_hit > 0 → caching active, prompts not random enough
ℹ️ All vLLM logs including NOISY_GEMM are also written to /tmp/vllm_live.log — useful for debugging when tmux buffer fills up.

04d Apply NOISY_GEMM Debug Patch — Mandatory

🚨 This step is NOT optional. NOISY_GEMM lines do not appear by default — they require this patch. Without it you cannot verify mining is working, measure your m value, or confirm both DP engines are active. Apply it now as part of every fresh setup.

This patch adds a single print statement inside the Pearl mining kernel that fires every time a valid GEMM operation is detected. It is the only way to visually confirm your miner is actually performing work and to measure your batch size (m value).

Step 1 — Apply the patch
python3 -c " with open('/root/pearl/miner/vllm-miner/src/vllm_miner/config.py', 'r') as f: content = f.read() old = ' return (m >= min_m) and (n >= min_n) and (k >= min_k)' new = ''' result = (m >= min_m) and (n >= min_n) and (k >= min_k) if result: print(f\"NOISY_GEMM_CALLED: m={m} n={n} k={k}\", flush=True) return result''' content = content.replace(old, new) with open('/root/pearl/miner/vllm-miner/src/vllm_miner/config.py', 'w') as f: f.write(content) print('Patched!') "
Step 2 — Restart vLLM to activate the patch
pkill -9 -f "pearl-gateway"; pkill -9 -f "vllm"; pkill -9 -f "EngineCore"; sleep 3 && rm -f /tmp/pearlgw.sock && tmux kill-session -t miner && tmux new-session -d -s miner && tmux send-keys -t miner "cd /root/pearl && source ~/.bashrc && /root/pearl/.venv/bin/pearl-gateway start > /tmp/gateway.log 2>&1 & sleep 10 && /root/pearl/.venv/bin/vllm serve pearl-ai/Llama-3.3-70B-Instruct-pearl --host 0.0.0.0 --port 8000 --max-model-len 8192 --gpu-memory-utilization 0.9 --enforce-eager --data-parallel-size 2 --no-enable-prefix-caching --no-enable-chunked-prefill" Enter && echo "✓ Restarting with patch active"
Step 3 — Wait for model to load, then verify patch is working
sleep 120 && curl -s http://localhost:8000/health && echo "READY" && tmux capture-pane -t miner -p -S -5000 | grep "NOISY_GEMM" | grep "n=57344" | tail -3
What you seeWhat it meansAction
m=5000-8174, n=57344 on both workersMining at full capacity ✅Proceed to Step 5
m=1024, n=57344Batch too small — too few concurrent requestsEnsure 32 Python workers are running
n=28672 instead of 57344TP mode instead of DP modeAdd --data-parallel-size 2 to vLLM command
No NOISY_GEMM output at allPatch not active or worker not startedVerify patch was applied, vLLM restarted, worker running
⚠️ The patch modifies a file in /root/pearl — it survives pod restarts as long as /root persists. However if you re-clone the repo or reinstall the miner you must reapply it.

05 Health Check & Diagnostics

Master Health Check (paste this every time you reconnect)

Full Diagnostic
echo "=== TMUX ===" && tmux ls && echo "=== SOCKETS ===" && ss -x | grep pearlgw | wc -l && echo "=== VLLM ===" && pgrep -f "vllm serve" | wc -l && echo "=== GPU ===" && nvidia-smi --query-gpu=index,utilization.gpu,memory.used,power.draw --format=csv,noheader && echo "=== MINING ADDRESS ===" && cat /proc/$(pgrep -f "pearl-gateway" | head -1)/environ | tr '\0' '\n' | grep "MINING_ADDRESS" && echo "=== NOISY_GEMM ===" && tmux capture-pane -t miner -p -S -5000 | grep "NOISY_GEMM" | grep "n=57344" | tail -3 && echo "=== LOOP ===" && tmux capture-pane -t loop -p -S -3 | tail -2 && echo "=== CURL JOBS ===" && pgrep -f "curl.*localhost:8000" | wc -l && echo "=== REQUESTS RUNNING ===" && curl -s http://localhost:8000/metrics | grep "num_requests_running" | grep -v "^#\|reason" | awk '{print $2}' | tr '\n' ' ' && echo "" && echo "=== CACHE HITS ===" && curl -s http://localhost:8000/metrics | grep "cache_hit" | grep -v "^#\|external\|mm_cache" | awk '{print $2}' | tr '\n' ' ' && echo "" && echo "=== PEERS ===" && cd /root/pearl && ./bin/prlctl -u rpcuser -P rpcpass -s localhost:44107 --notls getpeerinfo 2>/dev/null | grep "addr" | wc -l && echo "=== BLOCK COUNT ===" && ./bin/prlctl -u rpcuser -P rpcpass -s localhost:44107 --notls getblockcount 2>/dev/null && echo "=== WATCHDOG ===" && cat /tmp/loop_watchdog.log 2>/dev/null || echo "No restarts yet" && echo "=== BLOCKS ===" && tmux capture-pane -t miner -p -S -5000 | grep -i "block accepted\|Block found\|proof"

Expected Healthy Values

CheckHealthy ValueAction if Wrong
TMUX SESSIONSminer, loop, node, worker, watchdogRecreate missing sessions
SOCKETS4Restart miner — gateway/vLLM disconnected
PEERS8-16 on RunPod (normal)16 = normal without exposed port 44108. Not a problem. See Step 0b.
VLLM1Restart miner tmux session
GPU utilization90-98% both GPUsRestart Python worker — check worker session for errors
GPU power draw600-690W each (near 700W TDP)Low power = GPU idle = worker stalled or vLLM degraded
GPU memory~132GB eachvLLM crashed — restart miner
NOISY_GEMM m value5000-8000+Use longer prompts in loop. Low m = less mining throughput.
NOISY_GEMM n value57344Must be 57344 — confirms DP mode working
NOISY_GEMM workersBoth Worker PIDs firingOnly one firing = one GPU idle — restart loop
CURL JOBS0 (Python worker) or 500+ (bash loop)Python worker uses threads not curl — 0 curl jobs is correct
REQUESTS RUNNING30-70 per engine (balanced)0 on both engines = worker stopped or vLLM degraded
WORKERreq counts climbing, outputs visibleTimeouts = vLLM not ready. Restart worker after vLLM is READY.
REQUESTS RUNNING30-50 per engine (balanced)0 on one engine = unbalanced — restart loop
CACHE HITS0.0Prompts too similar — randomize more
LOOPMany PIDs visible, large count numberRestart loop or check watchdog log
MINING ADDRESSYour prl1p... addressKill gateway and restart with correct address in ~/.bashrc
WATCHDOGNo restarts yet / shows timestampsNot running → set up loop watchdog (Section 08b)

06 Troubleshooting

🔴 Problem: GPU shows 0% utilization persistently (confirmed on RunPod dashboard)

This is NOT a sampling artifact if RunPod dashboard also shows 0%. Root cause is almost always the request loop — either using wait instead of sleep 1, or short prompts that produce m values below the 1024 threshold.

Diagnose — check requests actually running
curl -s http://localhost:8000/metrics | grep "num_requests_running" | grep -v "^#\|reason" | awk '{print $2}' | tr '\n' ' '
✅ Good
30-50 requests running per engine
❌ Bad
0.0 0.0 → loop not running or requests completing too fast

Fix: Kill loop, restart with sleep 1 (not wait) and long prompts (~150+ tokens). See Step 4 loop command.

🔴 Problem: Loop keeps stalling every 60 seconds — watchdog restarts but immediately stalls again

If the watchdog log shows restarts every 60 seconds with 0 curl jobs each time, and vLLM responds healthy but GPU stays at 0% with ~120W power draw — vLLM is in a degraded state. This happens after processing hundreds of millions of tokens continuously (typically after 1-2 days of running). vLLM responds to health checks and completes requests instantly, but stops actually using the GPU.

Signs in watchdog log:

Degraded state pattern in watchdog log
Sun May 3 10:54:38 UTC 2026 - Loop stalled (0 curl jobs), restarting... Sun May 3 10:54:42 UTC 2026 - Loop restarted Sun May 3 10:55:42 UTC 2026 - Loop stalled (0 curl jobs), restarting... Sun May 3 10:55:46 UTC 2026 - Loop restarted # Repeating every 60 seconds = vLLM degraded, not loop issue

Fix: Full restart of vLLM and gateway. Model is cached so takes ~3 minutes:

Full miner restart
pkill -9 -f "pearl-gateway"; pkill -9 -f "vllm"; pkill -9 -f "EngineCore"; pkill -9 -f "Worker"; sleep 3 && rm -f /tmp/pearlgw.sock && tmux kill-session -t miner && tmux new-session -d -s miner && tmux send-keys -t miner "cd /root/pearl && source ~/.bashrc && /root/pearl/.venv/bin/pearl-gateway start > /tmp/gateway.log 2>&1 & sleep 10 && /root/pearl/.venv/bin/vllm serve pearl-ai/Llama-3.3-70B-Instruct-pearl --host 0.0.0.0 --port 8000 --max-model-len 8192 --gpu-memory-utilization 0.9 --enforce-eager --data-parallel-size 2 --no-enable-prefix-caching --no-enable-chunked-prefill" Enter
ℹ️ After restart, also clear the watchdog log so it doesn't fill up: rm -f /tmp/loop_watchdog.log && echo "cleared"

DeepGEMM is trying to JIT-compile CUDA kernels and failing. Root cause: VLLM_USE_DEEP_GEMM env var is not set or not reaching the vLLM process.

Verify env var is set
echo $VLLM_USE_DEEP_GEMM
✅ Good
0
❌ Bad
Empty → add to ~/.bashrc and source it, then kill miner session and recreate

🔴 Problem: vLLM crashes with "mining_paused: no block template available"

The blockchain node is still syncing. vLLM starts but immediately crashes because there is no block to mine.

Check sync status
cd /root/pearl && ./bin/prlctl -u rpcuser -P rpcpass -s localhost:44107 --notls getblockchaininfo 2>/dev/null | grep -E "blocks|headers"

Wait until blocks == headers before starting vLLM. Can take 5-15 minutes on first launch.

🔴 Problem: Gateway crashes with "mining_address: Field required"

The PEARLD_MINING_ADDRESS env var is not reaching the gateway process. This happens when env vars are only exported inline rather than in ~/.bashrc, or when the miner tmux session was created before the vars were set.

Fix
echo $PEARLD_MINING_ADDRESS

If empty: add to ~/.bashrc, source it, then kill and recreate the miner tmux session before restarting.

🔴 Problem: "vllm: command not found" in miner tmux session

source .venv/bin/activate often fails silently inside tmux send-keys, so vllm is not in PATH.

Fix: Always use FULL PATHS: /root/pearl/.venv/bin/vllm and /root/pearl/.venv/bin/pearl-gateway instead of relying on venv activation.

🔴 Problem: Socket count is 0 after restart

Stale socket file from previous run. Gateway creates /tmp/pearlgw.sock and won't overwrite it.

Fix
rm -f /tmp/pearlgw.sock && echo "cleared"

Always delete the socket before restarting. Add to all restart procedures.

Fix
pkill -9 -f "pearl-gateway" && pkill -9 -f "vllm" && pkill -9 -f "EngineCore" && sleep 5

Then restart miner with full command from Step 4.

🔴 Problem: Socket count is 0

Gateway and vLLM are not connected. Happens when they start in separate sessions.

Fix
pkill -9 -f "pearl-gateway" && pkill -9 -f "vllm" && pkill -9 -f "EngineCore" && sleep 5

Restart BOTH gateway and vLLM together in the SAME miner session.

🔴 Problem: NOISY_GEMM n is 28672 (not 57344)

TP mode is active instead of DP. Restart with --data-parallel-size 2 flag.

Verify
pgrep -f "vllm serve" | xargs -I{} cat /proc/{}/cmdline | tr '\0' ' ' | grep "data-parallel"
✅ Good
--data-parallel-size 2 visible
❌ Bad
Not visible → restart with correct flag

🔴 Problem: No blocks found after hours

🔴 Problem: Wrong mining address in gateway

Check actual address
cat /proc/$(pgrep -f "pearl-gateway" | head -1)/environ | tr '\0' '\n' | grep "MINING_ADDRESS"

If wrong: pkill -f "pearl-gateway" then restart miner with correct PEARLD_MINING_ADDRESS.

🟡 Problem: vLLM keeps dying — add miner watchdog

If vLLM crashes repeatedly, add a watchdog that monitors and restarts it automatically. Note: this is separate from the loop watchdog below.

Create miner watchdog
cat > /root/pearl/watchdog.sh << 'EOF' #!/bin/bash while true; do VLLM=$(pgrep -f "vllm serve" | wc -l) SOCK=$(ss -x | grep pearlgw | wc -l) if [ "$VLLM" -eq 0 ] || [ "$SOCK" -lt 2 ]; then echo "$(date) - Restarting miner..." >> /tmp/watchdog.log pkill -9 -f "pearl-gateway"; pkill -9 -f "vllm"; pkill -9 -f "EngineCore" sleep 5 rm -f /tmp/pearlgw.sock cd /root/pearl && source ~/.bashrc && \ /root/pearl/.venv/bin/pearl-gateway start > /tmp/gateway.log 2>&1 & sleep 10 && \ /root/pearl/.venv/bin/vllm serve pearl-ai/Llama-3.3-70B-Instruct-pearl \ --host 0.0.0.0 --port 8000 --max-model-len 8192 \ --gpu-memory-utilization 0.9 --enforce-eager \ --data-parallel-size 2 --no-enable-prefix-caching \ --no-enable-chunked-prefill & sleep 900 fi sleep 60 done EOF chmod +x /root/pearl/watchdog.sh && tmux new-session -d -s watchdog && tmux send-keys -t watchdog "/root/pearl/watchdog.sh" Enter && echo "✓ Miner watchdog running"

🔄 Worker Watchdog — Required

Monitors the Python worker and restarts it if it stops. Checks every 60 seconds.

Create worker watchdog
cat > /root/loop_watchdog.sh << 'EOF' #!/bin/bash while true; do WORKER_COUNT=$(pgrep -f "pearl_worker.py" | wc -l) if [ "$WORKER_COUNT" -lt 1 ]; then echo "$(date) - Worker stopped, restarting..." >> /tmp/loop_watchdog.log tmux send-keys -t worker C-c 2>/dev/null sleep 2 tmux send-keys -t worker "cd /root/pearl && /root/pearl/.venv/bin/python pearl_worker.py" Enter echo "$(date) - Worker restarted" >> /tmp/loop_watchdog.log fi sleep 60 done EOF chmod +x /root/loop_watchdog.sh && tmux new-session -d -s watchdog && tmux send-keys -t watchdog "/root/loop_watchdog.sh" Enter && echo "✓ Watchdog running" && tmux ls | grep watchdog
Check watchdog log
cat /tmp/loop_watchdog.log 2>/dev/null || echo "No restarts yet"
cat > /root/loop_watchdog.sh << 'EOF' #!/bin/bash while true; do CURL_COUNT=$(pgrep -f "curl.*localhost:8000" | wc -l) if [ "$CURL_COUNT" -lt 10 ]; then echo "$(date) - Loop stalled (${CURL_COUNT} curl jobs), restarting..." >> /tmp/loop_watchdog.log tmux send-keys -t loop C-c 2>/dev/null sleep 2 pkill -f "curl.*localhost:8000" 2>/dev/null sleep 2 tmux send-keys -t loop "COUNT=0; while true; do COUNT=\$((COUNT+1)); for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128; do curl -s http://localhost:8000/v1/chat/completions -H 'Content-Type: application/json' -d '{\\\"model\\\": \\\"pearl-ai/Llama-3.3-70B-Instruct-pearl\\\", \\\"messages\\\": [{\\\"role\\\": \\\"user\\\", \\\"content\\\": \\\"Write a detailed comprehensive academic essay about topic \$COUNT variant \$i covering the following aspects in depth: historical background and origins dating back centuries, mathematical foundations and theoretical frameworks, scientific principles and empirical evidence, technological applications and modern implementations, economic implications and market dynamics, social and cultural impacts on society, philosophical interpretations and ethical considerations, future prospects and emerging research directions, comparative analysis with related fields, and practical case studies with real world examples.\\\"}], \\\"max_tokens\\\": 1}' > /dev/null & done; sleep 1; done" Enter echo "$(date) - Loop restarted" >> /tmp/loop_watchdog.log fi sleep 60 done EOF chmod +x /root/loop_watchdog.sh && tmux new-session -d -s watchdog && tmux send-keys -t watchdog "/root/loop_watchdog.sh" Enter && echo "✓ Loop watchdog running" && tmux ls | grep watchdog
✅ Good
watchdog: 1 windows (created ...)
❌ Bad
session already exists → kill existing: tmux kill-session -t watchdog, then retry
Check watchdog log anytime
cat /tmp/loop_watchdog.log 2>/dev/null || echo "No restarts yet"

07 Block Verification

Check Logs for Block Activity

Run
tmux capture-pane -t miner -p -S -50000 | grep -i "block accepted\|Block found\|proof\|submit"
✅ Block Found!
Block accepted by node! — submission_service.py
Block submission result: {'status': 'accepted'}
❌ No Output
No blocks found yet — check difficulty and wait

Check Explorer

Open in browser
https://explorer.pearlresearch.ai/address/YOUR_MINING_ADDRESS
✅ Good
Shows balance and transaction history with PRL received
❌ Bad
Address Not Found — no confirmed blocks yet (normal if new)

08 Quick Restart Reference

Quick Status Check (paste after reconnecting)

Run
pgrep -f "vllm serve" | wc -l && ss -x | grep pearlgw | wc -l && nvidia-smi --query-gpu=utilization.gpu --format=csv,noheader
✅ Healthy
1 / 4 / 97% / 97%
❌ Dead
0 / 0 / 0% / 0% → do full restart below

Full Clean Restart (address already in ~/.bashrc)

Run
pkill -9 -f "pearl-gateway"; pkill -9 -f "vllm"; pkill -9 -f "EngineCore"; pkill -9 -f "Worker"; sleep 3 && rm -f /tmp/pearlgw.sock && tmux kill-session -t miner 2>/dev/null; tmux new-session -d -s miner && tmux send-keys -t miner "cd /root/pearl && source ~/.bashrc && /root/pearl/.venv/bin/pearl-gateway start > /tmp/gateway.log 2>&1 & sleep 10 && /root/pearl/.venv/bin/vllm serve pearl-ai/Llama-3.3-70B-Instruct-pearl --host 0.0.0.0 --port 8000 --max-model-len 8192 --gpu-memory-utilization 0.9 --enforce-eager --data-parallel-size 2 --no-enable-prefix-caching --no-enable-chunked-prefill" Enter

Restart Loop Only

Run
tmux send-keys -t loop C-c

Then send the full loop command from Step 4.

08b Additional Critical Notes

Oyster Wallet Keeps Dying — This is Normal!

ℹ️ Oyster dies frequently. Mining does NOT need oyster running. Oyster is only needed to check balance or generate new addresses. You can ignore oyster dying.
Only run oyster when needed for balance check
/root/pearl/bin/oyster -u rpcuser -P pearl123 --noclienttls --noservertls --pearldusername=rpcuser --pearldpassword=rpcpass > /tmp/oyster.log 2>&1 & sleep 15 && /root/pearl/bin/prlctl -u rpcuser -P pearl123 -s localhost:44207 --wallet --notls getbalance

Model Download (First Run Only)

⚠️ First time vLLM runs it downloads the 70B model (~140GB). This takes 15-30 extra minutes. Subsequent runs use cached model from /workspace/.hf
Watch download progress
tmux capture-pane -t miner -p -S -20 | grep -i "download\|Downloading\|fetching"

NOISY_GEMM Not Showing — Debug Patch

ℹ️ The full NOISY_GEMM patch setup is in Step 04d of the main guide. If you followed the guide from the start, the patch is already applied. This section is only needed if you reinstalled the miner or skipped Step 04d.
Reapply patch (after reinstall only)
python3 -c " with open('/root/pearl/miner/vllm-miner/src/vllm_miner/config.py', 'r') as f: content = f.read() old = ' return (m >= min_m) and (n >= min_n) and (k >= min_k)' new = ''' result = (m >= min_m) and (n >= min_n) and (k >= min_k) if result: print(f\"NOISY_GEMM_CALLED: m={m} n={n} k={k}\", flush=True) return result''' content = content.replace(old, new) with open('/root/pearl/miner/vllm-miner/src/vllm_miner/config.py', 'w') as f: f.write(content) print('Patched!') "

After reapplying, restart vLLM and check: tmux capture-pane -t miner -p -S -5000 | grep "NOISY_GEMM" | grep "n=57344" | tail -3

OCR/Screenshot Address Warning

🚨 If you copy your mining address from a screenshot using OCR (Gemini, Google Lens, etc.) — NEVER trust it! Characters like 5/s, 0/O, m/n, l/1 are commonly confused. Always verify the address manually character by character or use the validateaddress command.

Difficulty Context

DifficultyExpected Block Time (2x H200)Status
~29,000~1 block/hourEarly network (April 27, 2026)
~68,000~2 hours/blockDay 3
~115,000~4 hours/blockDay 4
>150,0006-8+ hours/blockHighly competitive
Check current difficulty
cd /root/pearl && ./bin/prlctl -u rpcuser -P rpcpass -s localhost:44107 --notls getblockchaininfo 2>/dev/null | grep -E "blocks|difficulty"

Separate vLLM tmux Session Warning

🚨 If you have a tmux session called "vllm" from a previous setup — it can cause confusion. Old Worker processes may still show NOISY_GEMM but be disconnected from the gateway. Always check sockets (must be 4) to confirm connection, not just NOISY_GEMM output.
Kill stale vllm session if exists
tmux kill-session -t vllm 2>/dev/null; echo "done"

Gateway Debug Mode

Add --debug flag to gateway for more verbose logs including block submissions:

In the miner startup command, replace
pearl-gateway start
With
pearl-gateway --debug start

Full Loop Command (for Step 8 restarts)

Restart loop
tmux send-keys -t loop C-c && sleep 2 && pkill -f "curl.*localhost:8000" && sleep 2 && tmux send-keys -t loop "COUNT=0; while true; do COUNT=\$((COUNT+1)); for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128; do curl -s http://localhost:8000/v1/chat/completions -H 'Content-Type: application/json' -d '{\"model\": \"pearl-ai/Llama-3.3-70B-Instruct-pearl\", \"messages\": [{\"role\": \"user\", \"content\": \"Write a detailed comprehensive academic essay about topic \$COUNT variant \$i covering the following aspects in depth: historical background and origins dating back centuries, mathematical foundations and theoretical frameworks, scientific principles and empirical evidence, technological applications and modern implementations, economic implications and market dynamics, social and cultural impacts on society, philosophical interpretations and ethical considerations, future prospects and emerging research directions, comparative analysis with related fields, and practical case studies with real world examples.\"}], \"max_tokens\": 1}' > /dev/null & done; sleep 1; done" Enter

Node Peer Count Check

Check peers (need 8+)
cd /root/pearl && ./bin/prlctl -u rpcuser -P rpcpass -s localhost:44107 --notls getpeerinfo 2>/dev/null | grep "addr" | wc -l
✅ Good
8+ peers
❌ Bad
0-2 peers → node not synced yet, wait longer

09 Key Lessons Learned

Critical MistakeConsequenceFix
Using `wait` in request loop GPU goes to 0% between batches — burst/idle pattern, very inefficient Use `sleep 1` instead — keeps requests continuously overlapping
Sending requests to port 8001 DP=2 only exposes port 8000 — port 8001 requests are dropped Always send all requests to port 8000 only
Using --tensor-parallel-size 2 Reduces n to 28672, less mining efficiency Use --data-parallel-size 2
Prefix caching enabled Same prompts cached — NO GEMM = NO MINING Always use --no-enable-prefix-caching
Gateway in separate session from vLLM Socket not connected, env vars not inherited Start both in same tmux miner session
Sending same prompt repeatedly KV cache kicks in, GEMM skipped entirely Randomize with COUNT and i variables
config.yaml thresholds at 1 Overhead without benefit for our matrix sizes Keep at 1024 (default)
Not verifying mining address Blocks could go to wrong wallet Always validateaddress + check /proc environ
MINER_DEBUG env vars Don't reach EngineCore subprocess Use PEARL_LOG_LEVEL=DEBUG instead
Proof of Working Setup: Confirmed mining with NOISY_GEMM_CALLED: m=8174, n=57344 on both workers. GPU 0: 96%, 690W. GPU 1: 97%, 689W. One confirmed block on explorer with 2884 PRL (May 1, 2026). Second miner confirmed operational May 2, 2026. Setup: RunPod 2x H200 SXM, CUDA 12.8, DP=2, 128 concurrent long-prompt requests, sleep 1 loop.

10 Important Gotchas & Edge Cases

Password Confusion — Two Different Passwords!

ServiceUsernamePasswordPort
pearld node (prlctl)rpcuserrpcpass44107
oyster wallet (prlctl --wallet)rpcuserpearl12344207
🚨 Using wrong password is a common mistake! Node uses "rpcpass", wallet uses "pearl123"

Normal Warning Messages (Not Errors — Ignore These)

These are NORMAL — do not worry about them
Error creating a default config file: open /root/.oyster/oyster.conf: no such file or directory Error creating a default config file: open /root/.pearld/pearld.conf: no such file or directory Warning: Running on mainnet with --noclienttls is not recommended Warning: Running on mainnet with --noservertls is not recommended

Block Accepted ≠ Block Confirmed

⚠️ Seeing "Block accepted by node!" in logs does NOT guarantee the block makes it to the main chain. It can be orphaned if another miner found a block at the same height faster. The explorer is the ONLY ground truth for confirmed blocks and PRL balance.

validateaddress ismine: true is NOT 100% Reliable

⚠️ We discovered that validateaddress can return ismine: true even with a slightly different address (possible OCR corruption). Always verify the address character by character manually — don't rely solely on ismine: true.

sleep 10 Between Gateway and vLLM

The startup command uses pearl-gateway start & sleep 10 && vllm serve ...

The & runs gateway in background, sleep 10 gives it time to create the socket, then vLLM starts and connects to it. If vLLM starts before the socket exists, they won't connect.

HuggingFace Token

The pearl-ai model downloaded fine without an HF token in our setup. If you get auth errors:

Set HF token if needed
export HF_TOKEN=your_token_here

vLLM Process Name vs api_server

ℹ️ Some old diagnostic scripts use pgrep -f "api_server" to detect vLLM. This returns 0 even when vLLM IS running! Always use pgrep -f "vllm serve" instead.

tmux Buffer Limitation

By default tmux only stores a limited scroll buffer. Block activity messages from hours ago may not appear in tmux capture-pane. The explorer is more reliable for historical block confirmation.

Wallet Address from Same Seed

Running getnewaddress multiple times generates different addresses — all from the same seed phrase, all recoverable. But only one address is set as the mining address at a time. The second address generated (prl1p8jt0...) is a valid backup address from the same wallet.

Python Worker — Number of Workers vs m Value

The number of Python workers directly controls vLLM's batch size and therefore the m value:

Workersm ValueGPU UtilResult
3 (GPUs+1)~1024 (minimum)0%Barely mining — too few concurrent requests
16~2000-4000~60%Partial mining
325000-817490-98%✅ Optimal — recommended
128+N/A0%vLLM crashes from overload
ℹ️ The dev team's "GPUs+1=3 workers" advice applies to generation-heavy workloads. For prefill-heavy mining with H200s, 32 workers is the sweet spot.

Always Start Python Worker AFTER vLLM is READY

Starting the Python worker before vLLM is fully loaded causes a degraded state — vLLM accepts connections but stops processing requests. Always verify curl -s http://localhost:8000/health returns READY and metrics show 0.0 0.0 before starting the worker.

Bash Loop Accumulates Jobs — Degrades Over Time

The bash loop with sleep 1 fires 128 new curl jobs every second regardless of whether previous jobs finished. Over hours this accumulates to 10,000-15,000 concurrent processes, causing fork errors and degrading m values from 8174 down to 1000-3000. The Python worker avoids this entirely by using threads instead of processes.

vLLM Degrades After ~1-2 Days — Needs Periodic Restart

After processing hundreds of millions of tokens, vLLM can enter a degraded state where it responds to health checks and completes requests instantly but stops using the GPU. Requests complete in milliseconds with 0% GPU utilization and ~120W power draw. The watchdog loop restarts every 60 seconds but immediately stalls again — this is the telltale sign.

Fix: Full restart of vLLM and gateway. Model is cached so back online in ~3 minutes. Plan for a periodic restart every 24-48 hours as preventive maintenance.

⚠️ If your watchdog log shows restarts every 60 seconds for more than 5 minutes straight — stop restarting the loop and do a full miner restart instead. See Troubleshooting section.

Loop Stalls Silently — GPU Goes to 0%

The request loop can stall without any error message. Curl jobs drop to 0, GPU goes to 0%, but vLLM stays running and appears healthy. This happens because bash accumulates too many background jobs over time.

Signs: GPU 0% on RunPod dashboard, power draw drops to ~120W, NOISY_GEMM stops firing in tmux buffer, curl job count is 0.

Fix: Kill loop, restart it. Always set up the loop watchdog (Section 08b) to handle this automatically — it checks every 60 seconds and restarts if curl jobs drop below 10.

Unbalanced DP Engines — One Worker Firing Less

Sometimes requests distribute unevenly between the two DP engines — one engine gets 35 requests, the other gets 0-7. This shows as low m values on one Worker and lower GPU utilization. Root cause: loop stalled and restarted unevenly.

Fix: Restart the loop cleanly. Kill all curl jobs first, verify 0 remaining, then restart. The engines rebalance within the next batch.

ℹ️ Check balance with: curl -s http://localhost:8000/metrics | grep "num_requests_running" | grep -v "^#\|reason" | awk '{print $2}' | tr '\n' ' ' — both engines should show similar numbers.

Only 16 Peers — Discord Reports 200+

On RunPod (and most cloud providers), inbound connections are blocked by default. Your node can connect OUT to other peers but other nodes cannot connect IN to you. This limits you to ~8-16 outbound peers regardless of your --maxpeers setting.

The fix is exposing port 44108 before deploying your pod (see Step 0b). If already deployed, wait until next natural restart.

ℹ️ 16 peers is sufficient for mining. Block propagation works fine outbound-only. The difference between 16 and 200 peers is milliseconds of propagation time — negligible compared to the time between blocks.
⚠️ If you try to get a mining address before the node syncs, the address may be invalid. Wait at least 30-60 seconds after starting pearld and verify getblockcount returns a number before running getnewaddress.

Bash Loop vs Python Worker — Full Comparison

Both approaches send requests to vLLM to keep it busy mining. Here's how they compare after extensive real-world testing:

AspectBash Loop (sleep 1)Python Worker (32 threads)
Concurrent requests128 new jobs per second, uncappedExactly 32 at all times
Job accumulationGrows to 10,000-15,000+ over hoursAlways exactly 32 — never accumulates
m value (fresh start)8174 (peak)5000-8174 (consistent)
m value after 6+ hoursDegrades to 1000-3000Stays at 5000-8174
Fork errorsYes — after thousands of jobs accumulateNever
GPU utilization90-98% initially, degrades over time90-98% stable indefinitely
StabilityRequires loop restarts every few hoursRuns indefinitely without restarts
Prompt formatLong essay prompts (~150 tokens)Decipher format — dev team recommended
Output tokensmax_tokens=1max_tokens=3 — can eyeball outputs
ComplexitySimple bash — easy to understandRequires Python file on server
Watchdog neededYes — loop stalls frequentlyRarely — threads auto-retry
vLLM crash riskHigh — floods vLLM with thousands of requestsLow — controlled concurrency

Why the bash loop worked at all

The bash loop fires 128 new curl requests every second as background processes. On a fresh start, this floods vLLM with enough concurrent requests to fill the batch (m=8174). The GPU runs at near 100%. However, because it never waits for jobs to finish before firing new ones, jobs accumulate indefinitely. After 6-12 hours you have 10,000+ zombie curl processes, fork errors appear, and vLLM's batch scheduler starts behaving erratically — m values drop and GPU utilization degrades.

Why wait doesn't work instead of sleep 1

Using wait in the bash loop makes it fire 128 jobs then wait for ALL of them to complete before firing the next batch. On H200s with fast prefill, batches complete in ~1 second — but there's still a gap between batches where the GPU sits idle at 0%. This burst/idle pattern is inefficient. sleep 1 overlaps batches continuously but causes accumulation. Neither is ideal — which is why the Python worker is better.

Why sleep 1 works short-term but fails long-term

sleep 1 was the discovered middle ground between wait and uncapped firing. Instead of waiting for all 128 jobs to finish, it fires a new batch every second regardless — keeping requests continuously overlapping so the GPU never idles. This produces m=8174 and 90-98% GPU utilization on a fresh start.

The problem: every second, 128 new background processes are created whether or not the previous ones finished. vLLM processes requests in ~1-3 seconds each, so the queue grows by ~128 jobs/second net. After 2 hours: ~15,000 zombie curl processes. The OS hits its process limit (fork errors), vLLM's batch scheduler degrades under the queue pressure, and m values fall from 8174 to 1000-3000. A loop restart temporarily fixes it but the cycle repeats.

The Python worker solves this by design — threads block on the HTTP response, so there are always exactly NUM_WORKERS requests in flight. No accumulation, no degradation.

Why 32 Python workers is the sweet spot

Each Python worker sends one request, waits for the response, then immediately sends the next. With 32 workers running simultaneously, there are always exactly 32 requests in flight. On 2x H200 with DP=2, this keeps both engines busy enough to produce m=5000-8174. Fewer workers (3, as suggested by the dev team) produces m=1024 — the minimum threshold — because H200s process requests so fast that the batch is empty most of the time. More workers (128+) overwhelms vLLM's queue and causes it to crash.

ℹ️ Recommendation: Use the Python worker (32 workers) for all new setups. Keep the bash loop command saved as a fallback. If the Python worker ever fails to start or causes issues, the bash loop will get you mining immediately while you debug.

Community Resources

☄️ lordofpearls.xyz — First independent Pearl block explorer. Live network metrics, top miners, top holders, hashrate since genesis, world node map. Built by lordkuba. Free.
🚨 vLLM will crash immediately on startup with "mining_paused: no block template available" if the blockchain node is still syncing. Always verify blocks == headers before starting the miner. The node typically takes 5-15 minutes to sync on first launch.

Env Vars Must Be in ~/.bashrc — Not Just Exported Inline

Exporting vars inline in the tmux send-keys command is unreliable — the vars often don't reach subprocesses. Always add them to ~/.bashrc and use source ~/.bashrc in the miner startup. The gateway will fail with "mining_address: Field required" if PEARLD_MINING_ADDRESS is not in the environment.

Always Use Full Paths for vllm and pearl-gateway

Using source .venv/bin/activate inside tmux send-keys frequently fails silently, leaving vllm not in PATH and producing "vllm: command not found". Always use /root/pearl/.venv/bin/vllm and /root/pearl/.venv/bin/pearl-gateway explicitly.

Delete Stale Socket Before Every Restart

The gateway socket at /tmp/pearlgw.sock persists after the gateway dies. On restart, if the old socket file exists, the new gateway may fail or vLLM may connect to a dead socket. Always run rm -f /tmp/pearlgw.sock before restarting.

GPU 0% — Real Issue vs Sampling Artifact

⚠️ If nvidia-smi shows 0% GPU but you catch it in occasional bursts (35→0→35→0), that MAY be a sampling artifact from the batch/wait pattern. But if RunPod dashboard also shows 0% persistently AND power draw is ~120W (vs 690W when healthy), it is a REAL problem. The fix is always the loop: use sleep 1 + long prompts.

Short Prompts Kill Mining (m Value Too Low)

The should_use_noisy_gemm() function requires m ≥ 1024 (default threshold in config.yaml). Short prompts produce small batch sizes (m < 1024) and mining is skipped entirely. Always use long prefill-heavy prompts (~150+ tokens input, max_tokens=1). Target m=5000-8000+. Power draw is the quickest sanity check: 690W = mining, 120W = not mining.

Community Resources

☄️ Community Resources

Independent tools built by the Pearl mining community.

☄️
lordofpearls.xyz INDEPENDENT EXPLORER FREE

First independent block explorer for Pearl — built by lordkuba

Live metrics Top miners Top holders Hashrate since genesis World node map Real-time
Visit lordofpearls.xyz