Error correction software for Bose-Chaudhuri-Hochquenghem (BCH) codes is optimized for general purpose processors that do not equip hardware for Galois field arithmetic. The developed software applies parallelization with a table lookup method to reduce the number of iterations, and maximum parallelization under a cache size limitation is sought for a high throughput implementation. Since this method minimizes the number of lookup tables for encoding and decoding processes, a large parallel factor can be chosen for a given cache size.The naive word length of a general purpose CPU is used as a whole by employing the developed mask elimination method. The tradeoff of the algorithm complexity and the regularity is examined for several syndrome generation methods, which leads to a simple error detection scheme that reuses the encoder and a simplified syndrome generation method requiring only a small number of Galois field multiplications. The parallel factor for Chien search is increased much by transforming the error locator polynomial so that it contains symmetric exponents of positive and negative signs. The experimental results demonstrate that the developed software cannot only provide sufficient throughput for real-time error correction of NAND flash memory in embedded systems but also enhance the reliability of file systems in general purpose computers.