Complete installation, configuration, health monitoring & troubleshooting
Complete installation, configuration, health monitoring & troubleshooting
| Component | This Guide's Setup | Notes |
|---|---|---|
| GPU | 2x NVIDIA H200 SXM | Confirmed working. All expected values in this guide are for H200. |
| VRAM per GPU | 143,771 MiB (~141GB) | After model load: ~132,964 MiB GPU0 / ~131,399 MiB GPU1 |
| CUDA Version | 12.8 | Minimum: 12.4. Tested on 12.8. |
| Driver | 570.211.01 | Any 520+ should work |
| GPU Count | Exactly 2 | Guide uses --data-parallel-size 2 |
| System RAM | 64GB+ | Needed for build + model loading |
| Disk | 300GB+ persistent | Model ~140GB + builds ~50GB + chain ~5GB |
| OS | Ubuntu 22.04 | RunPod default image |
| Provider | RunPod | See Section 00a for other providers |
| GPU power at full mining | ~690W each (near 700W TDP) | This is the health indicator — if power is 120W, mining is not happening |
| GPU | Community Status | Notes |
|---|---|---|
| H200 SXM ×2 | ✅ This guide — confirmed | Reference setup for this guide |
| H100 SXM ×2 | ✅ Community confirmed | Works. 80GB VRAM each. Adjust VRAM expectations in health checks. |
| H100 NVL ×1 + H200 ×1 | ⚠️ Community reported | Mixed setup. Some users got blocks. |
| Single H200 | ⚠️ Possible | Use --data-parallel-size 1, 64 requests. Lower hashrate. |
| A100 ×2 | ❌ Not recommended | Ampere architecture — Pearl kernel targets Hopper. May not compile. |
| RTX 4090 ×2 | ❌ Insufficient VRAM | 24GB each = 48GB total. Not enough for 70B model. |
Run these immediately after SSH-ing in. If any check fails, reprovision before continuing.
nvidia-smi
df -h | sort -rh | head -8
free -h
lsb_release -a 2>/dev/null || cat /etc/os-release | head -5
curl -s --max-time 5 https://github.com > /dev/null && echo "GitHub OK" && curl -s --max-time 5 https://huggingface.co > /dev/null && echo "HuggingFace OK"
| Scenario | Peers | Impact |
|---|---|---|
| Port 44108 NOT exposed (RunPod default) | ~16 outbound only | Works fine. Block propagation slightly slower. |
| Port 44108 exposed | Up to 200 inbound+outbound | Better connectivity, faster block propagation. |
| Discord reports of 200+ peers | 200+ | These users have inbound port exposed AND are on providers with open firewall. |
echo "=== GPU ===" && nvidia-smi --query-gpu=name,memory.total,driver_version --format=csv,noheader && echo "=== DISK ===" && df -h | sort -rh | head -5 && echo "=== RAM ===" && free -h | grep Mem && echo "=== OS ===" && lsb_release -d 2>/dev/null && echo "=== NETWORK ===" && curl -s --max-time 5 https://github.com > /dev/null && echo "GitHub OK" || echo "GitHub BLOCKED"
This guide was built and tested on RunPod. The core setup is identical across providers — only storage paths and a few installation details differ. Use this table to adapt the guide for your provider.
| Provider | HF_HOME path | UV Cache path | Persistent storage | Notes |
|---|---|---|---|---|
| RunPod ✅ Tested | /workspace/.hf |
/workspace/.uv-cache |
/workspace |
Deadsnakes PPA blocked — use UV for Python 3.12. Ubuntu 22.04. |
| Vast.ai | /root/.cache/huggingface or /workspace/.hf |
/root/.cache/uv |
/workspace (if attached) |
Use Custom Template. Ubuntu 22.04 works. apt python3.12 may work via deadsnakes. |
| Lambda Labs | /home/ubuntu/.cache/huggingface |
/home/ubuntu/.cache/uv |
/home/ubuntu |
Ubuntu 22.04. Python 3.12 via deadsnakes should work. Run as ubuntu not root. |
| CoreWeave | /mnt/data/.hf |
/mnt/data/.uv-cache |
/mnt/data |
Kubernetes-based. Persistent volume must be mounted manually. |
| Paperspace | /notebooks/.hf |
/notebooks/.uv-cache |
/notebooks |
Ubuntu 20.04/22.04. Python 3.12 via deadsnakes. |
| Any provider | Any path with 200GB+ free space | Any writable path | Check df -h for largest partition | Find largest partition: df -h | sort -rh | head -5 |
Replace every occurrence of /workspace/.hf with your provider's persistent storage path, and /workspace/.uv-cache with the UV cache path. The two places these appear are:
export HF_HOME=/YOUR_PROVIDER_PATH/.hf
cd /root/pearl && export UV_CACHE_DIR=/YOUR_PROVIDER_PATH/.uv-cache && export HF_HOME=/YOUR_PROVIDER_PATH/.hf && task build:miner
| Provider | Python 3.12 Method | Command |
|---|---|---|
| RunPod | apt blocked — use UV | uv python install 3.12 |
| Vast.ai | Try apt first, fallback to UV | apt-get install -y python3.12 || uv python install 3.12 |
| Lambda / Paperspace | apt via deadsnakes PPA | add-apt-repository ppa:deadsnakes/ppa && apt-get install -y python3.12 |
| Any provider (universal) | UV always works | uv python install 3.12 |
nvidia-smi && echo "CUDA OK" || echo "NO GPU DETECTED"
df -h | sort -rh | head -5
Pick the partition with 300GB+ free space for HF_HOME. The 70B model needs ~140GB.
| Setting | Value | Why |
|---|---|---|
| Parallelism | --data-parallel-size 2 | NOT tensor parallel — TP reduces m dimension |
| Prefix Caching | --no-enable-prefix-caching | MUST disable — caching = no GEMM = no mining |
| Chunked Prefill | --no-enable-chunked-prefill | Must disable for correct mining behavior |
| GPU Memory | --gpu-memory-utilization 0.9 | Leave 10% headroom |
| Model Length | --max-model-len 8192 | Fits in 80GB VRAM |
| Execution | --enforce-eager | Required for Pearl kernel |
| ZK Speed | export RAYON_NUM_THREADS=96 | Faster proof generation |
| Deep GEMM | export VLLM_USE_DEEP_GEMM=0 | Disable — conflicts with Pearl GEMM |
| Requests | 128 concurrent long-prompt requests | Long prompts (~150+ tokens) needed for m≥5000 |
| Loop pattern | sleep 1 (NOT wait) | wait causes GPU to idle between batches → 0% util |
| Request port | port 8000 ONLY | DP=2 exposes single port — port 8001 drops silently |
| Socket Count | 4 ESTAB connections | 2 per DP engine = 4 total when healthy |
| n value in NOISY_GEMM | 57344 | Confirms DP mode (TP gives 28672) |
| Node RPC | port 44107 (pearld) | pearl daemon |
| Wallet RPC | port 44207 (oyster) | wallet daemon |
wget -q https://go.dev/dl/go1.24.2.linux-amd64.tar.gz && tar -C /usr/local -xzf go1.24.2.linux-amd64.tar.gz && export PATH=$PATH:/usr/local/go/bin && echo 'export PATH=$PATH:/usr/local/go/bin' >> ~/.bashrc
go version
go version go1.24.2 linux/amd64curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y && source ~/.cargo/env
rustc --version
rustc 1.xx.x (xxxxxxx YYYY-MM-DD)curl -LsSf https://astral.sh/uv/install.sh | sh && source $HOME/.local/bin/env
uv 0.x.xsh -c "$(curl --location https://taskfile.dev/install.sh)" -- -d -b /usr/local/bin
Task version: vx.x.xapt-get update -qq && apt-get install -y tmux
apt-get install python3.12 will fail with "Unable to locate package". Use UV to install Python 3.12 instead (UV is already installed above).uv python install 3.12
ln -sf $(uv python find 3.12) /usr/local/bin/python3.12 && update-alternatives --install /usr/bin/python3 python3 /usr/local/bin/python3.12 1 && python3 --version
Python 3.12.xcd /root && git clone https://github.com/pearl-research-labs/pearl.git && cd pearl
pwd → should show /root/pearlcd /root/pearl && task build:blockchain
ls -la /root/pearl/bin/pearld /root/pearl/bin/oyster /root/pearl/bin/prlctl
cd /root/pearl && export UV_CACHE_DIR=/workspace/.uv-cache && export HF_HOME=/workspace/.hf && task build:miner
ls /root/pearl/.venv/bin/vllm && ls /root/pearl/.venv/bin/pearl-gateway
Add all required env vars to ~/.bashrc now (you will update PEARLD_MINING_ADDRESS after Step 3):
cat >> ~/.bashrc << 'EOF'
export PEARLD_RPC_URL=http://localhost:44107
export PEARLD_RPC_USER=rpcuser
export PEARLD_RPC_PASSWORD=rpcpass
export PEARLD_MINING_ADDRESS=PLACEHOLDER
export HF_HOME=/workspace/.hf
export VLLM_USE_DEEP_GEMM=0
export RAYON_NUM_THREADS=96
EOF
source ~/.bashrc && echo $VLLM_USE_DEEP_GEMM
source ~/.bashrc before starting the miner in Step 4.cd /root/pearl && ./bin/oyster --create
When prompted, answer as follows:
| Prompt | Answer |
|---|---|
| Do you want to add a passphrase? | No (just press Enter) — or set one you'll remember |
| Do you have an existing seed phrase? | No |
| Seed phrase shown | ⚠️ WRITE IT DOWN NOW — all 12 words in order |
| Type OK to confirm | OK |
tmux new-session -d -s node && tmux new-session -d -s miner && tmux new-session -d -s loop
tmux send-keys -t node "cd /root/pearl && ./bin/pearld --rpcuser=rpcuser --rpcpass=rpcpass --rpclisten=0.0.0.0:44107 --txindex --notls --maxpeers=200" Enter
Wait 30 seconds then verify:
cd /root/pearl && ./bin/prlctl -u rpcuser -P rpcpass -s localhost:44107 --notls getblockcount
/root/pearl/bin/oyster -u rpcuser -P pearl123 --noclienttls --noservertls --pearldusername=rpcuser --pearldpassword=rpcpass > /tmp/oyster.log 2>&1 & sleep 15 && /root/pearl/bin/prlctl -u rpcuser -P pearl123 -s localhost:44207 --wallet --notls getnewaddress
sed -i 's/PEARLD_MINING_ADDRESS=PLACEHOLDER/PEARLD_MINING_ADDRESS=YOUR_ACTUAL_ADDRESS/' ~/.bashrc && source ~/.bashrc && echo $PEARLD_MINING_ADDRESS — confirm it prints your address before proceeding./root/pearl/bin/prlctl -u rpcuser -P pearl123 -s localhost:44207 --wallet --notls validateaddress YOUR_ADDRESS
rm -f /tmp/pearlgw.sockrm -f /tmp/pearlgw.sock && tmux kill-session -t miner 2>/dev/null; tmux new-session -d -s miner && tmux send-keys -t miner "cd /root/pearl && source ~/.bashrc && /root/pearl/.venv/bin/pearl-gateway start > /tmp/gateway.log 2>&1 & sleep 10 && /root/pearl/.venv/bin/vllm serve pearl-ai/Llama-3.3-70B-Instruct-pearl --host 0.0.0.0 --port 8000 --max-model-len 8192 --gpu-memory-utilization 0.9 --enforce-eager --data-parallel-size 2 --no-enable-prefix-caching --no-enable-chunked-prefill" Enter
tail -5 /tmp/gateway.logcd /root/pearl && ./bin/prlctl -u rpcuser -P rpcpass -s localhost:44107 --notls getblockchaininfo 2>/dev/null | grep -E "blocks|headers"
nvidia-smi --query-gpu=index,memory.used --format=csv,noheader
curl -s http://localhost:8000/health && echo "READY" || echo "NOT READY"
sleep 1 NOT wait! Using wait causes GPU to drop to 0% between batches (burst/idle pattern). sleep 1 keeps requests continuously overlapping for 90%+ GPU utilization!tmux send-keys -t loop "COUNT=0; while true; do COUNT=\$((COUNT+1)); for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128; do curl -s http://localhost:8000/v1/chat/completions -H 'Content-Type: application/json' -d '{\"model\": \"pearl-ai/Llama-3.3-70B-Instruct-pearl\", \"messages\": [{\"role\": \"user\", \"content\": \"Write a detailed comprehensive academic essay about topic \$COUNT variant \$i covering the following aspects in depth: historical background and origins dating back centuries, mathematical foundations and theoretical frameworks, scientific principles and empirical evidence, technological applications and modern implementations, economic implications and market dynamics, social and cultural impacts on society, philosophical interpretations and ethical considerations, future prospects and emerging research directions, comparative analysis with related fields, and practical case studies with real world examples.\"}], \"max_tokens\": 1}' > /dev/null & done; sleep 1; done" Enter
nvidia-smi --query-gpu=index,utilization.gpu --format=csv,noheader
tmux capture-pane -t miner -p -S -5000 | grep "NOISY_GEMM" | tail -3
-S -5000 (not -S -50) to look back far enough — the buffer fills with other logs quickly.The vLLM metrics endpoint is the most reliable way to confirm everything is working correctly:
curl -s http://localhost:8000/metrics | grep -E "num_requests_running|cache_hit" | grep -v "^#\|reason\|external\|mm_cache"
echo "=== TMUX ===" && tmux ls && echo "=== SOCKETS ===" && ss -x | grep pearlgw | wc -l && echo "=== VLLM ===" && pgrep -f "vllm serve" | wc -l && echo "=== GPU ===" && nvidia-smi --query-gpu=index,utilization.gpu,memory.used,power.draw --format=csv,noheader && echo "=== MINING ADDRESS ===" && cat /proc/$(pgrep -f "pearl-gateway" | head -1)/environ | tr '\0' '\n' | grep "MINING_ADDRESS" && echo "=== NOISY_GEMM ===" && tmux capture-pane -t miner -p -S -5000 | grep "NOISY_GEMM" | grep "n=57344" | tail -3 && echo "=== LOOP ===" && tmux capture-pane -t loop -p -S -3 | tail -2 && echo "=== CURL JOBS ===" && pgrep -f "curl.*localhost:8000" | wc -l && echo "=== REQUESTS RUNNING ===" && curl -s http://localhost:8000/metrics | grep "num_requests_running" | grep -v "^#\|reason" | awk '{print $2}' | tr '\n' ' ' && echo "" && echo "=== CACHE HITS ===" && curl -s http://localhost:8000/metrics | grep "cache_hit" | grep -v "^#\|external\|mm_cache" | awk '{print $2}' | tr '\n' ' ' && echo "" && echo "=== PEERS ===" && cd /root/pearl && ./bin/prlctl -u rpcuser -P rpcpass -s localhost:44107 --notls getpeerinfo 2>/dev/null | grep "addr" | wc -l && echo "=== BLOCK COUNT ===" && ./bin/prlctl -u rpcuser -P rpcpass -s localhost:44107 --notls getblockcount 2>/dev/null && echo "=== WATCHDOG ===" && cat /tmp/loop_watchdog.log 2>/dev/null || echo "No restarts yet" && echo "=== BLOCKS ===" && tmux capture-pane -t miner -p -S -5000 | grep -i "block accepted\|Block found\|proof"
| Check | Healthy Value | Action if Wrong |
|---|---|---|
| TMUX SESSIONS | miner, loop, node, watchdog | Recreate missing sessions |
| SOCKETS | 4 | Restart miner — gateway/vLLM disconnected |
| PEERS | 8-16 on RunPod (normal) | 16 = normal without exposed port 44108. Not a problem. See Step 0b. |
| VLLM | 1 | Restart miner tmux session |
| GPU utilization | 90-98% both GPUs | Restart loop — use sleep 1 + long prompts |
| GPU power draw | 600-690W each (near 700W TDP) | Low power = GPU idle = loop stalled |
| GPU memory | ~132GB each | vLLM crashed — restart miner |
| NOISY_GEMM m value | 5000-8000+ | Use longer prompts in loop. Low m = less mining throughput. |
| NOISY_GEMM n value | 57344 | Must be 57344 — confirms DP mode working |
| NOISY_GEMM workers | Both Worker PIDs firing | Only one firing = one GPU idle — restart loop |
| CURL JOBS | 500+ jobs in flight | Under 10 = loop stalled — watchdog should auto-restart |
| REQUESTS RUNNING | 30-50 per engine (balanced) | 0 on one engine = unbalanced — restart loop |
| CACHE HITS | 0.0 | Prompts too similar — randomize more |
| LOOP | Many PIDs visible, large count number | Restart loop or check watchdog log |
| MINING ADDRESS | Your prl1p... address | Kill gateway and restart with correct address in ~/.bashrc |
| WATCHDOG | No restarts yet / shows timestamps | Not running → set up loop watchdog (Section 08b) |
This is NOT a sampling artifact if RunPod dashboard also shows 0%. Root cause is almost always the request loop — either using wait instead of sleep 1, or short prompts that produce m values below the 1024 threshold.
curl -s http://localhost:8000/metrics | grep "num_requests_running" | grep -v "^#\|reason" | awk '{print $2}' | tr '\n' ' '
Fix: Kill loop, restart with sleep 1 (not wait) and long prompts (~150+ tokens). See Step 4 loop command.
DeepGEMM is trying to JIT-compile CUDA kernels and failing. Root cause: VLLM_USE_DEEP_GEMM env var is not set or not reaching the vLLM process.
echo $VLLM_USE_DEEP_GEMM
The blockchain node is still syncing. vLLM starts but immediately crashes because there is no block to mine.
cd /root/pearl && ./bin/prlctl -u rpcuser -P rpcpass -s localhost:44107 --notls getblockchaininfo 2>/dev/null | grep -E "blocks|headers"
Wait until blocks == headers before starting vLLM. Can take 5-15 minutes on first launch.
The PEARLD_MINING_ADDRESS env var is not reaching the gateway process. This happens when env vars are only exported inline rather than in ~/.bashrc, or when the miner tmux session was created before the vars were set.
echo $PEARLD_MINING_ADDRESS
If empty: add to ~/.bashrc, source it, then kill and recreate the miner tmux session before restarting.
source .venv/bin/activate often fails silently inside tmux send-keys, so vllm is not in PATH.
Fix: Always use FULL PATHS: /root/pearl/.venv/bin/vllm and /root/pearl/.venv/bin/pearl-gateway instead of relying on venv activation.
Stale socket file from previous run. Gateway creates /tmp/pearlgw.sock and won't overwrite it.
rm -f /tmp/pearlgw.sock && echo "cleared"
Always delete the socket before restarting. Add to all restart procedures.
pkill -9 -f "pearl-gateway" && pkill -9 -f "vllm" && pkill -9 -f "EngineCore" && sleep 5
Then restart miner with full command from Step 4.
Gateway and vLLM are not connected. Happens when they start in separate sessions.
pkill -9 -f "pearl-gateway" && pkill -9 -f "vllm" && pkill -9 -f "EngineCore" && sleep 5
Restart BOTH gateway and vLLM together in the SAME miner session.
TP mode is active instead of DP. Restart with --data-parallel-size 2 flag.
pgrep -f "vllm serve" | xargs -I{} cat /proc/{}/cmdline | tr '\0' ' ' | grep "data-parallel"
cat /proc/$(pgrep -f "pearl-gateway" | head -1)/environ | tr '\0' '\n' | grep "MINING_ADDRESS"
If wrong: pkill -f "pearl-gateway" then restart miner with correct PEARLD_MINING_ADDRESS.
If vLLM crashes repeatedly, add a watchdog that monitors and restarts it automatically. Note: this is separate from the loop watchdog below.
cat > /root/pearl/watchdog.sh << 'EOF'
#!/bin/bash
while true; do
VLLM=$(pgrep -f "vllm serve" | wc -l)
SOCK=$(ss -x | grep pearlgw | wc -l)
if [ "$VLLM" -eq 0 ] || [ "$SOCK" -lt 2 ]; then
echo "$(date) - Restarting miner..." >> /tmp/watchdog.log
pkill -9 -f "pearl-gateway"; pkill -9 -f "vllm"; pkill -9 -f "EngineCore"
sleep 5
rm -f /tmp/pearlgw.sock
cd /root/pearl && source ~/.bashrc && \
/root/pearl/.venv/bin/pearl-gateway start > /tmp/gateway.log 2>&1 & sleep 10 && \
/root/pearl/.venv/bin/vllm serve pearl-ai/Llama-3.3-70B-Instruct-pearl \
--host 0.0.0.0 --port 8000 --max-model-len 8192 \
--gpu-memory-utilization 0.9 --enforce-eager \
--data-parallel-size 2 --no-enable-prefix-caching \
--no-enable-chunked-prefill &
sleep 900
fi
sleep 60
done
EOF
chmod +x /root/pearl/watchdog.sh && tmux new-session -d -s watchdog && tmux send-keys -t watchdog "/root/pearl/watchdog.sh" Enter && echo "✓ Miner watchdog running"
The request loop can stall silently — curl jobs drop to 0, GPU goes to 0%, but vLLM stays running. This watchdog checks every 60 seconds and auto-restarts the loop if fewer than 10 curl jobs are running. Set this up on every miner.
cat > /root/loop_watchdog.sh << 'EOF'
#!/bin/bash
while true; do
CURL_COUNT=$(pgrep -f "curl.*localhost:8000" | wc -l)
if [ "$CURL_COUNT" -lt 10 ]; then
echo "$(date) - Loop stalled (${CURL_COUNT} curl jobs), restarting..." >> /tmp/loop_watchdog.log
tmux send-keys -t loop C-c 2>/dev/null
sleep 2
pkill -f "curl.*localhost:8000" 2>/dev/null
sleep 2
tmux send-keys -t loop "COUNT=0; while true; do COUNT=\$((COUNT+1)); for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128; do curl -s http://localhost:8000/v1/chat/completions -H 'Content-Type: application/json' -d '{\\\"model\\\": \\\"pearl-ai/Llama-3.3-70B-Instruct-pearl\\\", \\\"messages\\\": [{\\\"role\\\": \\\"user\\\", \\\"content\\\": \\\"Write a detailed comprehensive academic essay about topic \$COUNT variant \$i covering the following aspects in depth: historical background and origins dating back centuries, mathematical foundations and theoretical frameworks, scientific principles and empirical evidence, technological applications and modern implementations, economic implications and market dynamics, social and cultural impacts on society, philosophical interpretations and ethical considerations, future prospects and emerging research directions, comparative analysis with related fields, and practical case studies with real world examples.\\\"}], \\\"max_tokens\\\": 1}' > /dev/null & done; sleep 1; done" Enter
echo "$(date) - Loop restarted" >> /tmp/loop_watchdog.log
fi
sleep 60
done
EOF
chmod +x /root/loop_watchdog.sh && tmux new-session -d -s watchdog && tmux send-keys -t watchdog "/root/loop_watchdog.sh" Enter && echo "✓ Loop watchdog running" && tmux ls | grep watchdog
cat /tmp/loop_watchdog.log 2>/dev/null || echo "No restarts yet"
tmux capture-pane -t miner -p -S -50000 | grep -i "block accepted\|Block found\|proof\|submit"
https://explorer.pearlresearch.ai/address/YOUR_MINING_ADDRESS
pgrep -f "vllm serve" | wc -l && ss -x | grep pearlgw | wc -l && nvidia-smi --query-gpu=utilization.gpu --format=csv,noheader
pkill -9 -f "pearl-gateway"; pkill -9 -f "vllm"; pkill -9 -f "EngineCore"; pkill -9 -f "Worker"; sleep 3 && rm -f /tmp/pearlgw.sock && tmux kill-session -t miner 2>/dev/null; tmux new-session -d -s miner && tmux send-keys -t miner "cd /root/pearl && source ~/.bashrc && /root/pearl/.venv/bin/pearl-gateway start > /tmp/gateway.log 2>&1 & sleep 10 && /root/pearl/.venv/bin/vllm serve pearl-ai/Llama-3.3-70B-Instruct-pearl --host 0.0.0.0 --port 8000 --max-model-len 8192 --gpu-memory-utilization 0.9 --enforce-eager --data-parallel-size 2 --no-enable-prefix-caching --no-enable-chunked-prefill" Enter
tmux send-keys -t loop C-c
Then send the full loop command from Step 4.
/root/pearl/bin/oyster -u rpcuser -P pearl123 --noclienttls --noservertls --pearldusername=rpcuser --pearldpassword=rpcpass > /tmp/oyster.log 2>&1 & sleep 15 && /root/pearl/bin/prlctl -u rpcuser -P pearl123 -s localhost:44207 --wallet --notls getbalance
tmux capture-pane -t miner -p -S -20 | grep -i "download\|Downloading\|fetching"
Add a print statement to confirm NOISY_GEMM is being called:
python3 -c "
with open('/root/pearl/miner/vllm-miner/src/vllm_miner/config.py', 'r') as f:
content = f.read()
old = ' return (m >= min_m) and (n >= min_n) and (k >= min_k)'
new = ''' result = (m >= min_m) and (n >= min_n) and (k >= min_k)
if result:
print(f\"NOISY_GEMM_CALLED: m={m} n={n} k={k}\", flush=True)
return result'''
content = content.replace(old, new)
with open('/root/pearl/miner/vllm-miner/src/vllm_miner/config.py', 'w') as f:
f.write(content)
print('Patched!')
"
tmux capture-pane -t miner -p -S -50 | grep "NOISY_GEMM" | tail -3| Difficulty | Expected Block Time (2x H200) | Status |
|---|---|---|
| ~29,000 | ~1 block/hour | Early network (April 27, 2026) |
| ~68,000 | ~2 hours/block | Day 3 |
| ~115,000 | ~4 hours/block | Day 4 |
| >150,000 | 6-8+ hours/block | Highly competitive |
cd /root/pearl && ./bin/prlctl -u rpcuser -P rpcpass -s localhost:44107 --notls getblockchaininfo 2>/dev/null | grep -E "blocks|difficulty"
tmux kill-session -t vllm 2>/dev/null; echo "done"
Add --debug flag to gateway for more verbose logs including block submissions:
pearl-gateway start
pearl-gateway --debug start
tmux send-keys -t loop C-c && sleep 2 && pkill -f "curl.*localhost:8000" && sleep 2 && tmux send-keys -t loop "COUNT=0; while true; do COUNT=\$((COUNT+1)); for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128; do curl -s http://localhost:8000/v1/chat/completions -H 'Content-Type: application/json' -d '{\"model\": \"pearl-ai/Llama-3.3-70B-Instruct-pearl\", \"messages\": [{\"role\": \"user\", \"content\": \"Write a detailed comprehensive academic essay about topic \$COUNT variant \$i covering the following aspects in depth: historical background and origins dating back centuries, mathematical foundations and theoretical frameworks, scientific principles and empirical evidence, technological applications and modern implementations, economic implications and market dynamics, social and cultural impacts on society, philosophical interpretations and ethical considerations, future prospects and emerging research directions, comparative analysis with related fields, and practical case studies with real world examples.\"}], \"max_tokens\": 1}' > /dev/null & done; sleep 1; done" Enter
cd /root/pearl && ./bin/prlctl -u rpcuser -P rpcpass -s localhost:44107 --notls getpeerinfo 2>/dev/null | grep "addr" | wc -l
| Critical Mistake | Consequence | Fix |
|---|---|---|
| Using `wait` in request loop | GPU goes to 0% between batches — burst/idle pattern, very inefficient | Use `sleep 1` instead — keeps requests continuously overlapping |
| Sending requests to port 8001 | DP=2 only exposes port 8000 — port 8001 requests are dropped | Always send all requests to port 8000 only |
| Using --tensor-parallel-size 2 | Reduces n to 28672, less mining efficiency | Use --data-parallel-size 2 |
| Prefix caching enabled | Same prompts cached — NO GEMM = NO MINING | Always use --no-enable-prefix-caching |
| Gateway in separate session from vLLM | Socket not connected, env vars not inherited | Start both in same tmux miner session |
| Sending same prompt repeatedly | KV cache kicks in, GEMM skipped entirely | Randomize with COUNT and i variables |
| config.yaml thresholds at 1 | Overhead without benefit for our matrix sizes | Keep at 1024 (default) |
| Not verifying mining address | Blocks could go to wrong wallet | Always validateaddress + check /proc environ |
| MINER_DEBUG env vars | Don't reach EngineCore subprocess | Use PEARL_LOG_LEVEL=DEBUG instead |
| Service | Username | Password | Port |
|---|---|---|---|
| pearld node (prlctl) | rpcuser | rpcpass | 44107 |
| oyster wallet (prlctl --wallet) | rpcuser | pearl123 | 44207 |
Error creating a default config file: open /root/.oyster/oyster.conf: no such file or directory
Error creating a default config file: open /root/.pearld/pearld.conf: no such file or directory
Warning: Running on mainnet with --noclienttls is not recommended
Warning: Running on mainnet with --noservertls is not recommended
The startup command uses pearl-gateway start & sleep 10 && vllm serve ...
The & runs gateway in background, sleep 10 gives it time to create the socket, then vLLM starts and connects to it. If vLLM starts before the socket exists, they won't connect.
The pearl-ai model downloaded fine without an HF token in our setup. If you get auth errors:
export HF_TOKEN=your_token_here
pgrep -f "api_server" to detect vLLM. This returns 0 even when vLLM IS running! Always use pgrep -f "vllm serve" instead.By default tmux only stores a limited scroll buffer. Block activity messages from hours ago may not appear in tmux capture-pane. The explorer is more reliable for historical block confirmation.
Running getnewaddress multiple times generates different addresses — all from the same seed phrase, all recoverable. But only one address is set as the mining address at a time. The second address generated (prl1p8jt0...) is a valid backup address from the same wallet.
The request loop can stall without any error message. Curl jobs drop to 0, GPU goes to 0%, but vLLM stays running and appears healthy. This happens because bash accumulates too many background jobs over time.
Signs: GPU 0% on RunPod dashboard, power draw drops to ~120W, NOISY_GEMM stops firing in tmux buffer, curl job count is 0.
Fix: Kill loop, restart it. Always set up the loop watchdog (Section 08b) to handle this automatically — it checks every 60 seconds and restarts if curl jobs drop below 10.
Sometimes requests distribute unevenly between the two DP engines — one engine gets 35 requests, the other gets 0-7. This shows as low m values on one Worker and lower GPU utilization. Root cause: loop stalled and restarted unevenly.
Fix: Restart the loop cleanly. Kill all curl jobs first, verify 0 remaining, then restart. The engines rebalance within the next batch.
curl -s http://localhost:8000/metrics | grep "num_requests_running" | grep -v "^#\|reason" | awk '{print $2}' | tr '\n' ' ' — both engines should show similar numbers.On RunPod (and most cloud providers), inbound connections are blocked by default. Your node can connect OUT to other peers but other nodes cannot connect IN to you. This limits you to ~8-16 outbound peers regardless of your --maxpeers setting.
The fix is exposing port 44108 before deploying your pod (see Step 0b). If already deployed, wait until next natural restart.
Exporting vars inline in the tmux send-keys command is unreliable — the vars often don't reach subprocesses. Always add them to ~/.bashrc and use source ~/.bashrc in the miner startup. The gateway will fail with "mining_address: Field required" if PEARLD_MINING_ADDRESS is not in the environment.
Using source .venv/bin/activate inside tmux send-keys frequently fails silently, leaving vllm not in PATH and producing "vllm: command not found". Always use /root/pearl/.venv/bin/vllm and /root/pearl/.venv/bin/pearl-gateway explicitly.
The gateway socket at /tmp/pearlgw.sock persists after the gateway dies. On restart, if the old socket file exists, the new gateway may fail or vLLM may connect to a dead socket. Always run rm -f /tmp/pearlgw.sock before restarting.
The should_use_noisy_gemm() function requires m ≥ 1024 (default threshold in config.yaml). Short prompts produce small batch sizes (m < 1024) and mining is skipped entirely. Always use long prefill-heavy prompts (~150+ tokens input, max_tokens=1). Target m=5000-8000+. Power draw is the quickest sanity check: 690W = mining, 120W = not mining.