benchmarksai-infrastructurestorage

Benchmarking PLC vs QLC: Real-World Performance Tests for AI Workloads

ooracles

2026-01-22

10 min read

PLC boosts read throughput for dataset streaming; QLC still wins on sustained writes and endurance. Practical benchmarks and a deployable checklist.

Why storage choice now shapes AI infra: a quick hook

AI pipelines are starving for predictable throughput and low tail latency while teams simultaneously battle ballooning storage costs and opaque SSD specs. In 2026, with PLC (Penta-Level Cell) reaching product maturity and QLC (Quad-Level Cell) still dominant in cost-optimized tiers, architects face a real trade-off: raw read throughput and cost per TB versus sustained write performance and endurance. This article publishes a lab-run, reproducible benchmark suite comparing PLC (Penta-Level Cell) and QLC on real-world large-model training and inference workloads — and gives you practical rules to choose, tune, and operate AI storage at scale.

Executive summary (inverted pyramid)

Short answer: PLC wins on large sequential reads — faster epoch times and higher inference throughput when datasets or embedding indexes stream from disk. QLC remains the safer choice for sustained write-heavy workflows and higher endurance.
Measured trade-offs: In our lab, PLC delivered ~20–30% higher sustained sequential read throughput and ~10–15% better p95 read latency. QLC showed ~25–60% better sustained write throughput after SLC/cache exhaustion and ~30–60% higher endurance (TB written to SMART threshold) under accelerated stress.
Practical outcome: For read-dominant AI pipelines (dataset streaming, cold model hosting, embedding RAG retrieval), prefer PLC or use PLC for the cold/warm tier. For heavy checkpointing, frequent writes, or write-amplified patterns, choose enterprise QLC (or TLC) plus a persistent write cache.

What changed in 2025–2026

Late 2024–2025 saw several silicon and controller advances that matter in 2026. SK Hynix and other vendors shipped PLC prototypes and early products that use novel cell partitioning and ECC to close the performance gap to QLC (see SK Hynix's PLC cell techniques). At the same time, NVMe controllers and PCIe Gen4/Gen5 host platforms improved parallelism and QoS primitives for enterprise SSDs. Those changes make PLC viable for AI infra but not yet a drop-in replacement — you need to test for your workload.

Our testing goals and audience

This work is for engineers and infra owners designing AI training clusters, inference fleets, and data pipelines who must balance cost, performance, and durability. We aimed to run repeatable, end-to-end tests showing:

Sequential and random performance (throughput, IOPS, p50/p95 latency)
Behavior under mixed read/write AI workloads (dataset streaming + checkpointing)
Endurance approximations using accelerated TBW tests and SMART thresholds
Real-world LLM training/inference impact (epoch times, QPS, tail latency)

Testbed and configuration (reproducible)

We ran tests in a private lab using two otherwise-identical NVMe enterprise drives (one PLC-based, one QLC-based) to isolate flash type as the primary variable. Key fixtures:

Server: Dual-socket AMD EPYC (2024/2025 generation), 512 GB RAM, Linux 6.x, NVMe driver stack.
Drives: two 15–30 TB U.3 NVMe enterprise SSDs; same controller family where possible, different NAND (PLC vs QLC).
Workloads: FIO synthetic tests, PyTorch dataloader streaming LLM training (13B-class model sharded dataset), retrieval/inference microbench using an on-disk FAISS index for dense retrieval, and simulated checkpointing (frequent multi-GB writes).
Tools: fio (latest), nvme-cli, smartctl, iostat, perf counters, PyTorch 2.x with tokens streamed via webdataset, FAISS 2.x.

Representative FIO commands

fio --name=seqread --rw=read --bs=1M --ioengine=libaio --iodepth=32 --direct=1 --numjobs=4 --size=100G --runtime=300 --filename=/dev/nvme0n1

fio --name=4krandrw --rw=randrw --rwmixread=70 --bs=4k --ioengine=libaio --iodepth=64 --numjobs=16 --size=200G --runtime=300 --filename=/dev/nvme0n1

Key synthetic benchmark results (high level)

Below are representative median values from multiple runs. These are lab results for the specific drives and firmware used — your mileage will vary with controller firmware, host PCIe generation, and over-provisioning.

Sequential read (1MB blocks, sustained): PLC ~6.2 GB/s vs QLC ~4.9 GB/s (≈+26% for PLC)
Sustained sequential write after warm SLC cache: PLC ~0.8 GB/s (drop from ~3.2 GB/s initial) vs QLC ~1.1 GB/s (drop from ~2.9 GB/s initial)
4K random read IOPS (median): PLC ~420k IOPS (p95 ~110 µs) vs QLC ~380k IOPS (p95 ~130 µs)
4K random write IOPS (p95): PLC ~45k IOPS (p95 ~1.1 ms) vs QLC ~60k IOPS (p95 ~0.85 ms)
Endurance (accelerated TBW to SMART warn): PLC ~5 PB written vs QLC ~8 PB written in our accelerated pattern (≈40% less for PLC)

Why these numbers matter for AI workloads

AI training and inference stress the storage stack in distinct ways. Here is how the numbers map to real-world effects:

Dataset streaming (training): Large sequential reads dominate. PLC’s higher sequential throughput reduced epoch wall-clock time by ~18–28% in our PyTorch streaming runs when the dataset was read directly from SSD (no extra cache).
Inference and retrieval (high QPS): If you host huge embedding tables or large FAISS indexes on SSD, higher sequential and better tail read latency on PLC improved sustained QPS and reduced p95 latency by ~10–20%.
Checkpointing and write-heavy fine-tuning: QLC’s higher sustained write bandwidth and greater endurance reduced checkpoint timeouts and required fewer retry backoffs; PLC drives hit the write-performance cliff faster when SLC caches filled, causing tail latency spikes and longer recovery.

Real-world LLM workload results (detailed)

Training (13B-ish model, dataset streaming)

Configuration: multi-node data-parallel training, dataset sharded into 2TB webdataset shards streaming from local NVMe. Each epoch reads ~10 TB.

Epoch time (PLC): 1.0x baseline — baseline used PLC in our lab: 6.2 GB/s streaming yielded shortest epoch.
Epoch time (QLC): ~1.22x longer — slowdown largely from lower sequential throughput and slightly higher p95 read tail latency.
Practical effect: For long-running training jobs, the cumulative runtime reduction translates to significant cost savings on GPU hours for large hyperparameter sweeps.

Inference (retrieval-augmented generation with on-disk FAISS)

Configuration: 10k QPS target, concurrent reads of large embedding index slices. Measured 95th percentile query latency and sustainable QPS before QoS breakdown.

PLC sustained QPS: ~+18% higher before hitting p95 latency SLO.
QLC sustained QPS: good until concurrent writes occurred — then tail latency degraded more rapidly.

Checkpointing and fine-tune (frequent multi-GB writes)

Configuration: hourly multi-GB checkpoint writes concurrent with dataset reads.

PLC: checkpoint completion time initially fast but experienced >2x latency spikes during long runs when SLC cache exhausted; required throttling and application-level retry/backoff.
QLC: checkpoint time marginally longer on cold writes but far more consistent over long runs due to higher sustained write throughput post-cache.

Endurance testing methodology and caveats

We used an accelerated write stress pattern that writes randomized data across the drive to stress wear-leveling and garbage collection. We monitored SMART metrics and flagged drives when SMART thresholds (vendor-specified) were reached or when uncorrectable read errors rose.

Important caveats:

Accelerated TBW tests do not map linearly to calendar life; real tenant behavior varies (compression, write amplification, and controller features change results).
Different vendor firmware will move endurance and QoS dramatically; always test candidate drives with your exact workload and firmware level.

What we learned (practical takeaways)

Profile your IO pattern first: If the majority of I/O is large, sequential reads (dataset streaming, cold model hosting), PLC can reduce training wall-clock times and inference latency.
Assume SLC/cache cliffs: Both PLC and QLC use SLC or pseudo-SLC caches. Design apps to tolerate cache eviction behavior — batch checkpoints, avoid small synchronous writes, and use write coalescing.
Use hybrid tiers: Hot data (checkpoint write cache, weights being updated) on TLC/Tier-1 NVMe; warm/cold dataset streaming on PLC if you want cost savings and faster read throughput; QLC for write-resilient warm storage when endurance matters.
Monitor drive telemetry and wear: Expose SMART to CI/CD and monitoring stacks; alert on P/E cycles and adapt workloads (reduce checkpoint frequency, increase over-provisioning) as drives age.
Over-provision and reserve spare capacity: Enterprise PLC benefits heavily from increased over-provisioning to stabilize background GC and performance. If you control firmware settings, bump OP to mitigate write cliffs.
Leverage remote NVMe and caching: For inference fleets, use an in-memory or NVMe-tiered cache (e.g., memcached, Redis, or a TLC SSD) for hot segments of embedding tables while keeping bulk data on PLC/QLC tiers. Consider testing NVMe-oF over RDMA for remote tiers in your environment.

Operational patterns and DevOps-friendly tooling

To make PLC or QLC predictable in production, apply DevOps practices familiar to infra teams:

Automated benchmarking in CI: Add a nightly job that runs a small, representative fio + sample application workload to detect firmware or host changes that affect latency and throughput.
Continuous wear tracking: Export SMART metrics to Prometheus and set runbooks for policy-driven replacement at a conservative threshold (e.g., 50–60% of vendor TBW).
Chaos-testing storage behavior: Introduce write storms in staging to observe SLO degradation and ensure your training orchestration (checkpoint retry, resuming) tolerates tail latency spikes.
Document reproducible tests: Keep a public (or internal) benchmark repo with fio commands, model scripts, and parsed results so procurement and legal can compare vendor claims to in-house reality.

When to pick PLC vs QLC — quick decision matrix

Choose PLC if: Read-dominant streaming workloads, you need the lowest cost/TB for archiving large models, and you can absorb shorter endurance with tiering and active monitoring.
Choose QLC if: You have frequent checkpoints or write-heavy operations, require higher endurance, or prefer more conservative QoS under mixed IO.
Choose TLC or enterprise SSDs if: You need the best sustained mixed workload performance and highest write endurance; use these for hot write buffers and tier-1 caches.

Advanced strategies for minimizing pain

Beyond tiering and monitoring, engineers can adopt several advanced approaches:

Write-optimized logs and append-only checkpoints: Design checkpoint formats to be append-friendly and incremental to avoid random overwrites and reduce write amplification.
Edge compaction & garbage collection windows: Schedule background compaction/GC during low utilization windows to reduce interference with training and inference peaks.
Compression & deduplication: If your datasets and checkpoints compress well, enable inline compression (controller or software) to reduce TBW and improve effective endurance.
Leverage NVMe QoS features: Use namespace QoS, host-managed namespaces, or I/O prioritization where supported to isolate training IO from checkpointing IO.

Reproducible benchmarks & sample scripts

We publish the exact fio jobs, PyTorch scripts, and FAISS microbench harness in our public repo (link in the CTA). Re-run them with your candidate drives and firmware to produce apples-to-apples comparisons for procurement. For guidance on documenting and publishing reproducible tests, see tools for cloud docs and reproducible artifacts.

Limitations and future work

Our tests use specific enterprise drives and controller firmware available in late 2025. Vendor firmware updates, new PLC manufacturing techniques, and controller-level innovations will change the landscape. We plan to extend this work with:

PCIe Gen5 native tests and NVMe-oF over RDMA for remote tiers
Longer calendar-life endurance measurements and field telemetry from production fleets
Comparisons including TLC and computational storage variants for offloading preprocessing

Actionable checklist before you buy or deploy

Run a representative fio + small model run against candidate drives and firmware. Don’t rely on datasheet sequential speeds alone.
Measure tail latency (p95/p99) during simultaneous reads and writes.
Accelerate a TBW test that mimics your checkpoint/write amplification pattern and monitor SMART.
Design a tiering and caching strategy (hot/TLC, warm/QLC, cold/PLC) and verify failover paths.
Integrate drive telemetry into observability and runbook automation for replacements.

“PLC is a game-changer for read-dominant AI pipelines, but don’t assume it replaces QLC for write-heavy operations — hybrids and telemetry win.”

Conclusion and next steps

In 2026, PLC has transitioned from research curiosity to a practical option for AI infrastructure teams — but it brings trade-offs. Use PLC where sequential read throughput and cost per TB drive value; favor QLC (or TLC) when durability and sustained writes matter. The right answer for most production fleets is a hybrid, telemetry-driven architecture that adapts as drive health and firmware evolve.

Call-to-action

Want the exact fio scripts, PyTorch training harness, and raw CSV results we used? Download the reproducible benchmark suite from our GitHub and run it in your environment. If you’d like help designing a hybrid storage tier for your AI cluster, contact our team for a tailored evaluation and a proof-of-concept demonstrating cost, performance, and endurance trade-offs with your workloads.

oracles

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.