Streaming Mobile Augmented Reality on Mobile Phones
Continuous recognition and tracking of objects in live video captured on a mobile device enables real-time user interaction. We demonstrate a streaming mobile augmented reality system with 1 second latency. User interest is automatically inferred from camera movements, so the user never has to press a button. Our system is used to identify and track book and CD covers in real time on a phoneâ„¢s viewfinder. Efficient motion estimation is performed at 30 frames per second on a phone, while fast search through a database of 20,000 images is performed on a server. 1
David M. Chen 1, Sam S. Tsai 1, Ramakrishna Vedantham 2, Radek Grzeszczuk 2, Bernd Girod 1
1 Information Systems Laboratory, Stanford University
2 Nokia Research Center, Palo Alto
Mobile augmented reality (MAR) is a wide class of applications where mobile devices augment usersâ„¢ perception of the world. Many mobile phones that capture video or still images of a scene can automatically recognize and annotate objects in the scene. Existing MAR systems include landmark recognition , product logo recognition , and CD/DVD cover recognition . Real-time augmentation on a phone remains difficult because a MAR system has delays in (1) extraction of query data on the phone, (2) transmission of the query data from the phone over a wireless network to a server hosting an image database, and (3) search through the database. Real-time recognition requires small delays in all three stages, while ensuring high recognition accuracy. The MAR systems presented in  all require delays of at least 3 seconds to recognize a newly appearing object. For continuous augmentation of live video, the recognition latency must be reduced to about 1 second. The system in  achieves around 1 second delay, but at the expense of continuously streaming a video from the mobile phone to the server. In this paper, a novel MAR system is presented for continuous recognition of book and CD covers in live video captured by a mobile phone, an innovation we refer to as streaming MAR. The user can point the camera at a book or CD and see the identity in the viewfinder in around 1 second. The boundary of the object is displayed and accurately tracked in real time. Both the objectâ„¢s identity and geometry are quickly retrieved from a server hosting a database of 20,000 book and CD images. As the user pans across the scene, the system automatically recognizes new objects that come into view, without the user ever having to press a button. Unlike , our system performs motion analysis on the phone and selectively decides when to send new query data, rather than continuously transmitting video over a wireless network. The paper is organized as follows. Sec. 2 presents the design and implementation of the streaming MAR system. Then, Sec. 3 shows the results of recognition tests in which our streaming MAR system is used to recognize many books and CDs in cluttered settings.