A modern storage hierarchy combines random-access memory, magnetic disk, and possibly optical disk or magnetic tape to try to keep pace with rapid advances in processor performance. I/O devices such as disks and tapes are considered reliable places to store persistent data such as user files. However, random-access memory is viewed as an unreliable place to store persistent data because it is vulnerable to power outages and operating system crashes .Memory's vulnerability to power outages is straightforward to understand and fix. A $100 uninterruptible power supply can keep a system running long enough to dump memory to disk in the event of a power outage or one can use non-volatile memory such as Flash RAM.
Memory's vulnerability to operating system crashes is more challenging. Most people would feel nervous if their system crashed while the sole copy of important data was in memory, even if the power stayed on. Consequently, file systems write data periodically to disk and transaction processing applications view transactions as committed only when data is written to disk.
Applications requiring high reliability, such as transaction processing, write data synchronously through to disk, but this limits performance to that of disk. While optimizations such as logging and group commit can increase effective throughput they work well only when there are concurrent or delayed operations that can be grouped together and they cannot improve the latency of individual operations.
Most file systems mitigate the performance lost in synchronous, reliability-induced writes by writing data asynchronously to disk. This allows a greater degree of overlap between CPU time and I/O time. Unfortunately, asynchronous writes make no firm guarantees about when the data is safe on disk; the exact moment depends on the disk queue length and disk speed. On these systems, users must resign themselves to the fact that their data may not be safe on disk when a write or close finishes.
Many file systems improve performance further by delaying some writes to disk in the hopes of the new data being deleted or overwritten This delay is often set to 30 seconds, which risks the loss of data written within 30 seconds of a crash. Unfortunately,1/3 to 2/3 of newly written data lives longer than 30 seconds and this data is written through to disk under this policy. File systems differ in how much data is delayed. For example, BSD 4.4 only delays partially written blocks and then only until the file is closed. Systems that delay more types of data and have longer delay periods are better able to decrease disk traffic, but risk losing more data.
Applications that desire maximum performance use a pure write-back scheme where data is written to disk only when the memory is full This can only be done by applications for which reliability is not an issue, such as compilers that write temporary files.
It is common for file systems to use a combination of write-back strategies. For example, many Unix file systems delay partially written file blocks while initiating asynchronously writes immediately for complete file blocks. However, all these strategies suffer from the same basic tradeoff: Avoiding disk writes to achieve good performance inevitably leads to a loss of reliability. The goal of the Rio (RAM I/O) file cache is to break this fundamental trade-off by improving the reliability of memory to be comparable to the reliability of disk. Reliable main memory allows Rio to use a pure write-back strategy (no reliability-induced writes to disk) while achieving reliability equivalent to that of a write-through file cache.