Suhani Vora
Suhani is a Research Scientist with a background in Biological Engineering, and is currently applying machine learning methods to the design of biomolecular sequences. She is also interested in applying deep learning to enhance 3D Computer Vision methods.
Research Areas
Authored Publications
Sort By
NeSF: Neural Semantic Fields for Generalizable Semantic Segmentation of 3D Scenes
Noha Radwan*
Klaus Greff
Henning Meyer
Kyle Genova
Transactions on Machine Learning Research (2022)
Preview abstract
We present NeSF, a method for producing 3D semantic fields from pre-trained density fields and sparse 2D semantic supervision.
Our method side-steps traditional scene representations by leveraging neural representations where 3D information is stored within neural fields.
In spite of being supervised by 2D signals alone, our method is able to generate 3D-consistent semantic maps from novel camera poses and can be queried at arbitrary 3D points.
Notably, NeSF is compatible with any method producing a density field, and its accuracy improves as the quality of the pre-trained density fields improve.
Our empirical analysis demonstrates comparable quality to competitive 2D and 3D semantic segmentation baselines on convincing synthetic scenes while also offering features unavailable to existing methods.
View details
Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations
Henning Meyer
Urs Bergmann
Klaus Greff
Noha Radwan
Alexey Dosovitskiy
Jakob Uszkoreit
Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Preview abstract
A classical problem in computer vision is to infer a 3D scene representation from few images that can be used to render novel views at interactive rates. Previous work focuses on reconstructing pre-defined 3D representations, e.g. textured meshes, or implicit representations, e.g. radiance fields, and often requires input images with precise camera poses and long processing times for each novel scene.
In this work, we propose the Scene Representation Transformer (SRT), a method which processes posed or unposed RGB images of a new area, infers a "set-latent scene representation", and synthesises novel views, all in a single feed-forward pass. To calculate the scene representation, we propose a generalization of the Vision Transformer to sets of images, enabling global information integration, and hence 3D reasoning. An efficient decoder transformer parameterizes the light field by attending into the scene representation to render novel views. Learning is supervised end-to-end by minimizing a novel-view reconstruction error.
We show that this method outperforms recent baselines in terms of PSNR and speed on synthetic datasets, including a new dataset created for the paper. Further, we demonstrate that SRT scales to support interactive visualization and semantic segmentation of real-world outdoor environments using Street View imagery.
View details
Kubric: A scalable dataset generator
Anissa Yuenming Mak
Austin Stone
Carl Doersch
Cengiz Oztireli
Charles Herrmann
Daniel Rebain
Derek Nowrouzezahrai
Dmitry Lagun
Fangcheng Zhong
Florian Golemo
Francois Belletti
Henning Meyer
Hsueh-Ti (Derek) Liu
Issam Laradji
Klaus Greff
Kwang Moo Yi
Matan Sela
Noha Radwan
Thomas Kipf
Tianhao Wu
Vincent Sitzmann
Yilun Du
Yishu Miao
(2022)
Preview abstract
Data is the driving force of machine learning. The amount and quality of training data is often more important for the performance of a system than the details of its architecture. Data is also an important tool for testing specific hypothesis, and for empirically evaluating the behaviour of complex systems. Synthetic data generation represents a powerful tool that can address all these shortcomings: 1) it is cheap 2) supports rich ground-truth annotations 3) offers full control over data and 4) can circumvent privacy and legal concerns. Unfortunately the toolchain for generating data is less well developed than that for building models. We aim to improve this situation by introducing Kubric: a scalable open-source pipeline for generating realistic image and video data with rich ground truth annotations.
We also publish a collection of generated datasets and baseline results on several vision tasks.
View details
A Comparison of Generative Models for Sequence Design
David Dohan
Ramya Deshpande
Olivier Chapelle
Babak Alipanahi
Machine Learning in Computational Biology Workshop (2019)
Preview abstract
In this paper, we compare generative models of different complexity for designing DNA and protein sequences using the Cross Entropy Method.
View details
Biological Sequences Design using Batched Bayesian Optimization
Zelda Mariet
Ramya Deshpande
David Dohan
Olivier Chapelle
NeurIPS workshop on Bayesian Deep Learning (2019)
Preview abstract
Being able to effectively design biological sequences like DNA and proteins would have transformative impact on medicine. Currently, the most popular method in the life sciences for performing design is directed evolution,which explores sequence space by making small mutations to existing sequences.Alternatively, Bayesian optimization (BO) provides an attractive framework for model-based black-box optimization, and has achieved many recent successes in life sciences applications. However, within the ML community, most large-scale BO efforts have focused on hyper-parameter tuning. These methods often do not translate to biological sequence design, where the search space is over a discrete alphabet, wet-lab experiments are run with considerable parallelism (1K-100K sequences per batch), and experiments are sufficiently slow and expensive that only few rounds of experiments are feasible. This paper discusses the particularities of batched BO on a large discrete space, and investigates the design choices that must be made in order to obtain robust, scalable, and experimentally successful models within this unique context.
View details
Future Semantic Segmentation Leveraging 3D Information
Soeren Pirk
ECCV 3D Reconstruction meets Semantics Workshop (2018)
Preview abstract
Predicting the future to anticipate the outcome of events and actions is a critical attribute of autonomous agents. In this work, we address the task of predicting future frame segmentation from a stream of monocular video by leveraging the 3D structure of the scene. Our framework is based on learnable sub-modules capable of predicting pixelwise scene semantic labels, depth, and camera ego-motion of adjacent frames. Ultimately, we observe that leveraging 3D structure in the model facilitates successful positioning of objects in the 3D scene, achieving state of the art accuracy in future semantic segmentation.
View details
Preview abstract
Predicting the future to anticipate the outcome of events and actions is a critical attribute of autonomous agents; particularly for agents which must rely heavily on real time visual data for decision making. Working towards this capability, we address the task of predicting future frame segmentation from a stream of monocular video by leveraging the 3D structure of the scene. Our framework is based on learnable sub-modules capable of predicting pixel-wise scene semantic labels, depth, and camera ego-motion of adjacent frames. We further propose a recurrent neural network based model capable of predicting future ego-motion trajectory as a function of a series of past ego-motion steps. Ultimately, we observe that leveraging 3D structure in the model facilitates successful prediction, achieving state of the art accuracy in future semantic segmentation.
View details