Representative Publications

  1. Learning Depth from Single Monocular Images, Ashutosh Saxena, Sung H. Chung, Andrew Y. Ng. In Neural Information Processing Systems (NIPS) 18, 2005. [pdf]

    (Infer a depthmap from a single still image. More.)
  2. Make3D: Learning 3-D Scene Structure from a Single Still Image, Ashutosh Saxena, Min Sun, Andrew Y. Ng, In IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2008. [pdf]

    (Journal version of ICCV-3dRR'07 and ICCV-VRML'07 with additional details on the algorithm and the learning/inference algorithm.)
  3. 3-D Depth Reconstruction from a Single Still Image, Ashutosh Saxena, Sung H. Chung, Andrew Y. Ng. International Journal of Computer Vision (IJCV), Aug 2007. [pdf]

    (Journal version of NIPS 2005 and IJCAI 2007 papers. More.)
  4. Depth Estimation using Monocular and Stereo Cues, Ashutosh Saxena, Jamie Schulte, Andrew Y. Ng. In International Joint Conference on Artificial Intelligence (IJCAI), 2007. [pdf]

    (Monocular cues were used to improve the performance of stereo vision. More.)

Other Publications

  1. Learning the Right Model: Efficient Max-Margin Learning in Laplacian CRFs, Dhruv Batra, Ashutosh Saxena. In Computer Vision and Pattern Recognition (CVPR), 2012. [PDF]

  2. Make3D: Depth Perception from a Single Still Image, Ashutosh Saxena, Min Sun, Andrew Y. Ng, In AAAI (Nectar track), 2008. [pdf]

    (This paper provides intuition on the general idea of depth perception, and its applications to various fields. Giving a high-level description (i.e., no mathematics), this paper summarizes our papers from 2005 till 2007.)
  3. Learning 3-D Scene Structure from a Single Still Image, Ashutosh Saxena, Min Sun, Andrew Y. Ng, In ICCV workshop on 3D Representation for Recognition (3dRR-07), 2007. (Best paper award.) [pdf, ppt, Talk]

    (State of the art algorithm in Nov 2006, which produces quantitatively accurate 3-d depths as well as visually pleasing fully textured 3-d flythroughs from an image. It proposes learning a novel parameterization suited to scene understanding. More.)
  4. Cascaded Classification Models: Combining Models for Holistic Scene Understanding, Geremy Heitz, Stephen Gould, Ashutosh Saxena, Daphne Koller. In Neural Information Processing Systems (NIPS), 2008. (full oral) [pdf, more]

    (We develop a learning method that couples sub-tasks of object detection, scene categorization, image segmentation, and 3-d reconstruction for improving performance in each of them.)
  5. Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models ,
    Congcong Li, Adarsh Kowdle, Ashutosh Saxena, Tsuhan Chen. In Neural Information Processing Systems (NIPS), 2010. [pdf, More]

  6. A generic model to compose vision modules for holistic scene understanding,
    Adarsh Kowdle, Congcong Li, Ashutosh Saxena and Tsuhan Chen. In European Conference on Computer Vision Workshop on Parts and Attributes (ECCV '10), 2010. [pdf, slides, More]

  7. 3-D Reconstruction from Sparse Views using Monocular Vision, Ashutosh Saxena, Min Sun, Andrew Y. Ng, In ICCV workshop on Virtual Representations and Modeling of Large-scale environments (VRML), 2007. [pdf]

    (Algorithm to build large scale 3d models from a few images, even with little or no overlap. More.)
  8. i23 - Rapid Interactive 3D Reconstruction from a Single Image, Savil Srivastava, Ashutosh Saxena, Christian Theobalt, Sebastian Thrun, Andrew Y. Ng. In Vision, Modelling and Visualization (VMV), 2009. [pdf]

    (Human in the loop was used to create fast 3D models.)
  9. High Speed Obstacle Avoidance using Monocular Vision and Reinforcement Learning, Jeff Michels, Ashutosh Saxena, Andrew Y. Ng. In International Conference on Machine Learning (ICML), 2005. [pdf, ppt]

    (A simplified version of the algorithm was used to drive a rccar autonomously in realtime. More.)

Others

  1. Learning Sound Location from a Single Microphone, Ashutosh Saxena, Andrew Y Ng. In International Conference on Robotics and Automation (ICRA), 2009. [pdf]

    (Infer incident angle of sound using a single microphone.)
  2. Rapid Interactive 3D Reconstruction from a Single Still Image, Ashutosh Saxena, Nuwan Senaratna, Savil Srivastava, Andrew Y. Ng. In SIGGRAPH Late Breaking work (Informal Session), 2008. [1-page pdf, Video]

    (Create 3D models/flythroughs using user scribbles---just a few seconds of user input.)
  3. Building a 3-D Model from a Single Still Image, Ashutosh Saxena and Andrew Y. Ng, Demonstration in Neural Information Processing Systems (NIPS), 2007.

    Also presented at NIPS Workshop on The Grammar of Vision: Probabilistic Grammar-Based Models for Visual Scene Understanding and Object Categorization, 2007. [png]

    Also presented at AAAI IS Demonstration, 2008.

    (Showing the beta version of the technology and the website. More.)

 

Suggested readings: Papers number 3 (ICCV-3dRR 2007), 5 (IJCV 2007) and 8 (NIPS 2005) give a good coverage of the monocular vision work. For a more informal summary, read 2 (AAAI-nectar).

 

Previous Results

  1. See Nov 2006 models (588 from internet images in ICCV-3dRR, and large-scale ones in ICCV-VRML) here.

    (For viewing VRMLs, we suggest Cortona plugin for Windows, and this for Linux. Takes less than a minute to install.)

  2. See NIPS 2005 results here.

 

Digg it | Reddit it | Slashdot