There are an increasing number of tools for automatic generation of HTML. Some of these, especially the save as facilities in certain office automation packages, generate very poor quality HTML. The project is to develop software that will process such HTML and clean it up, removing un-necessary and invalid markup and reducing the size of the marked up document as much as possible. The software could include options to validate compliance with various levels of HTML. The software could ultimately be offered as an online service.