Data-centric platform for Computer Vision

Data-centric platform for Computer Vision

Deep Video Analytics aims to revolutionize visual data analysis by providing a comprehensive platform for storage, analysis & sharing.


Upload videos or set of images. Download Youtube urls automatically. Browse & annotate uploaded videos. Ability to import pre-indexed datasets.


Perform scene detection, frame extraction on videos. Annotate frames, detections with bounding boxes, labels and metadata.


Extracted objects, along with entire frames and crops, are indexed using deep features. Feature vectors are used for visual search retrieval.


Deploy on variety of machines with/without GPUs, local & cloud. Docker + Kubernetes enables scalable deployment across clouds.

We take significant efforts to ensure that following models (code+weights included) work without having to write any code.


  • Visual Search as a primary interface

  • Upload videos, image datasets.

  • Ingest from various sources such as AWS S3, Youtube.

  • Pre-trained recognition, detection & OCR models.

  • Train custom detector models

  • User Interface for visualization, annotation & monitoring

  • REST API to simplify development of new front-ends applications

  • Deep Video Analytics Processing and Query Language for specifying tasks

  • Videos, frames, indexes, etc. stored in media directory, served through nginx

  • Perform full-text search on text metadata and names

  • Configure by specifying environment variables

  • Manage GPU memory/utilization by dynamically managing workers


  • Indexing using Google inception V3 trained on Imagenet

  • Multiple object detectors from TF object detection API

  • Face detection/alignment/recognition using MTCNN and Facenet

  • Open Images multi-label inception v3 for text tags

  • Deep OCR using CTPN & CRNN

Import external datasets using VDN


  • Labeled Faces in the Wild

Train new models

  • Fine-tune YOLO v2 detector using custom set of regions

  • Start using trained detector instantly by launching workers that process queue assigned to the detector.

Visual Data Network

Visual Data Network enables easy sharing of datasets, annotations, models and scripts.

visit VDN Github repo


Coming Soon!

Quick demo & tutorial

Deep Video analytics is implemented using Docker and works on Mac, Windows and Linux with latest version of Docker & docker-compose installed.

You will need to wait few minutes first time for the images to be pulled and models to be downloaded.

git clone
cd DeepVideoAnalytics
python start cpu
# Wait few minutes for container images to be downloaded
# You should be able to use both Web UI & jupyter
# To stop containers but retain volumes run
python stop cpu
# To stop and delete containers and volumes run
python clean cpu


Single machine without GPU

You need to have latest version of Docker and docker-compose installed.

Deploy on a machine without GPU

Single machine with NVIDIA GPU

You need to have nvidia-docker2 and compatible version of Docker installed.

Deploy on a machine with NVIDIA GPU

Kubernetes with cloud storage (S3/GCS)

DVA can be deployed on a Kubernetes cluster with GCS or S3 as media store.

Deploy on Google Cloud Platform


For a quick overview of design choices and vision behind this project we strongly recommend going through following presentation.

Deep Video Analytics uses DVAPQL for processing and querying visual data in a consistent manner. DVAPQL specification & examples can be found here

Paper & Citation

Coming Soon!


  1. Schroff, Florian, Dmitry Kalenichenko, and James Philbin. "Facenet: A unified embedding for face recognition and clustering." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.

  2. Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.

  3. Zhang, Kaipeng, et al. "Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks." IEEE Signal Processing Letters 23.10 (2016): 1499-1503.

  4. Liu, Wei, et al. "SSD: Single shot multibox detector." European Conference on Computer Vision. Springer International Publishing, 2016.

  5. Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.

  6. Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.

  7. Johnson, Jeff, Matthijs Douze, and Hervé Jégou. "Billion-scale similarity search with GPUs." arXiv preprint arXiv:1702.08734 (2017).

Issues, Questions & Contact

Please submit all software related bugs and questions using Github issues, for other questions you can contact me at

© 2017 Akshay Bhat, Cornell University.
All rights reserved.