sLLM or RAG AI Server 4X Furiosa AI RNGD NPU Server
sLLM or RAG AI Server 4X Furiosa AI RNGD NPU Server
FuriosaAI RNGD NPU-based AI Inference Server: Lower-End Specification
(Optimized for cost-efficiency and essential functionality for 4x RNGD NPUs)
This specification aims to provide a functional and stable platform for AI inference with four FuriosaAI RNGD NPUs.
Key Component: FuriosaAI RNGD NPU Accelerator Cards
-
4x FuriosaAI RNGD (Renegade) NPU:
-
Architecture: Tensor Contraction Processor (TCP)
-
Process Node: TSMC 5nm
-
Performance: 512 TOPS (INT8), 256 TFLOPS (BF16), 512 TFLOPS (FP8), 1024 TOPS (INT4)
-
Memory: 48GB HBM3
-
Memory Bandwidth: 1.5 TB/s
-
Interface: PCIe Gen5 x16
-
TDP: 150W (Passive cooling, relies on system airflow)
-
Form Factor: PCIe dual-slot, full-height, 3/4 length
-
Note: The NPU itself remains high-performance. The "lower spec" refers to the host server components supporting it.
-
Typical Server Platform Specifications (for integrating 4x FuriosaAI RNGD NPUs)
-
NPU Quantity:
-
4x FuriosaAI RNGD NPU cards.
-
-
Processor (CPU):
-
Single Intel Xeon Scalable Processor (e.g., a lower-core-count 4th/5th Gen Sapphire Rapids/Emerald Rapids variant with sufficient PCIe lanes) or a lower-core-count AMD EPYC Processor (e.g., Genoa/Bergamo Series with sufficient PCIe lanes).
-
Key Feature: Prioritize PCIe Gen5 x16 lane count (64-128 lanes from a single CPU) over raw core count, as long as the core count is sufficient for OS and data handling (e.g., 16-24 cores).
-
-
Motherboard:
-
Single-Socket Server Motherboard (Intel LGA4677 or AMD SP5/SP6 Socket)
-
Key Features:
-
At least 4x PCIe Gen5 x16 physical slots: Must support this lane configuration from a single CPU.
-
Support for DDR5 Registered ECC RAM.
-
Minimum of 2x NVMe (PCIe Gen4/Gen5) SSD slots (U.2 or M.2).
-
Robust VRM and Power Delivery: Sufficient for the chosen CPU and four 150W NPUs.
-
Integrated Server Management (IPMI/BMC).
-
-
-
System Memory (RAM):
-
256GB (4x 64GB or 8x 32GB) DDR5 Registered ECC RAM
-
-
Storage:
-
Primary: 2TB NVMe PCIe Gen4 SSD (U.2 or M.2)
-
For Operating System, AI frameworks, and core models.
-
-
Secondary (Optional for lower spec): If budget allows, a 4TB Enterprise SATA SSD for additional model storage could be added later.
-
-
Power Supply Unit (PSU):
-
1200W - 1600W 80 PLUS Platinum Certified, Redundant Power Supplies (recommended).
-
Power Connectors: Must have at least four 12VHPWR (or compatible PCIe 8-pin) connectors for the NPUs.
-
-
Server Chassis:
-
4U Rackmount Server Chassis optimized for Multi-Accelerator Cards.
-
Key Features:
-
Designed for passive cooling: The chassis must facilitate excellent front-to-back airflow across the PCIe cards.
-
Adequate spacing for 4x dual-slot RNGD cards.
-
Standard hot-swap drive bays (e.g., 4-8 bays).
-
Basic rail kits for rack mounting.
-
-
Examples: A more basic 4U server chassis from Supermicro or a similar OEM that prioritizes direct airflow over fancy features.
-
-
Cooling System:
-
CPU: Standard server-grade active heatsink for the chosen single CPU.
-
Chassis Fans: A sufficient number of high-CFM, hot-swappable fans (e.g., 3-6 fans) strategically placed for strong front-to-back airflow through the NPU compartment.
-
-
Network:
-
2x 10GbE Network Interface Cards (NICs) (often integrated into server motherboards).
-
-
Software Stack:
-
Operating System: Ubuntu Server 22.04 LTS (or a minimal RHEL/CentOS Stream installation).
-
FuriosaAI SDK: Essential for NPU functionality and model deployment.
-
AI Frameworks: PyTorch, TensorFlow, ONNX Runtime (as needed).
-
Containerization: Docker (Kubernetes is more for orchestration across multiple servers).
-
Couldn't load pickup availability
Low stock: 1 left
View full details