Data-centric platform for Computer Vision

Data-centric platform for Computer Vision

Deep Video Analytics aims to revolutionize visual data analysis by providing a comprehensive platform for storage, analysis & sharing.


Upload videos or set of images. Download Youtube urls automatically. Browse & annotate uploaded videos. Ability to import pre-indexed datasets.


Perform scene detection, frame extraction on videos. Annotate frames, detections with bounding boxes, labels and metadata.


Extracted objects, along with entire frames and crops, are indexed using deep features. Feature vectors are used for visual search retrieval.


Deploy on variety of machines with/without GPUs, local & cloud. Docker compose enables automated setup of Postgres & RabbitMQ.

Features & Models

We take significant efforts to ensure that following models (code+weights included) work without having to write any code.


  • Visual Search as a primary interface

  • Upload videos, image datasets.

  • Ingest from various sources such as AWS S3, Youtube.

  • Pre-trained recognition, detection & OCR models.

  • Train custom detector models

  • User Interface for visualization, annotation & monitoring.

  • REST API to simplify development of new front-ends applications.

  • Deep Video Analytics Processing and Query Language for specifying tasks

  • Videos, frames, indexes, etc. stored in media directory, served through nginx.

  • Perform full-text search on text metadata and names.

  • Configure by specifying environment variables.

  • Manage GPU memory/utilization by dynamically launching & shutting down workers.


Import external datasets using VDN


  • Labeled Faces in the Wild

Train new models

  • Fine-tune YOLO v2 detector using custom of set of regions.

  • Start using trained detector instantly by launching workers that process queue assigned to the new custom detector.


For a quick overview of design choices and vision behind this project we strongly recommend going through following presentation. Also stored as readme.pdf inside the repo.


Coming Soon!

Deep Video Analytics Processing & Query Language (DVAPQL)

DVAPQL enables processing and querying of visual data in a consistent manner using Deep Video Analytics. All functionality of Deep Video Analytics can be expressed in form of DVAPQL scripts which are then launched on distributed workers. The data model underlying DVA makes it simple to reason about state of the system.

DVAPQL specification & examples of tasks/models/datasets


Pre-built docker images for both CPU & GPU versions are available on Docker Hub.

Machines without an Nvidia GPU

Deep Video analytics is implemented using Docker and works on Mac, Windows and Linux. Make sure you have latest version of Docker installed.

git clone
cd DeepVideoAnalytics/deploy/demo && docker-compose up
# Above command will automatically pull container images from docker-hub

Machines with Nvidia GPU

You need to have latest version of Docker and nvidia-docker installed. The GPU Dockerfile is slightly different from the CPU version dockerfile.

pip install --upgrade nvidia-docker-compose
git clone
cd DeepVideoAnalytics/deploy/single
nvidia-docker-compose -f docker-compose-gpu.yml up
# Above command will automatically pull container images from docker-hub

Security warning

When deploying/running on remote Ubuntu machines on VPS services such as Linode etc. beware of the Docker/UFW firewall issues. Docker bypasses UFW firewall and opens the port 8000 to internet. You can change the behavior by using a loopback interface ( and then forwarding the port (8000) over SSH tunnel, an example of this is shown here.

Architecture & Deployment

Deep Video Analytics can be deployed on cloud in a scalable cost-effective manner to effectively leverage spot-pricing, cheap storage without any significant changes to codebase. This website and associated applications are deployed using this method.

Paper & Citation

Coming Soon!


  1. Schroff, Florian, Dmitry Kalenichenko, and James Philbin. "Facenet: A unified embedding for face recognition and clustering." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.

  2. Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.

  3. Zhang, Kaipeng, et al. "Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks." IEEE Signal Processing Letters 23.10 (2016): 1499-1503.

  4. Liu, Wei, et al. "SSD: Single shot multibox detector." European Conference on Computer Vision. Springer International Publishing, 2016.

  5. Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.

  6. Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.

  7. Johnson, Jeff, Matthijs Douze, and Hervé Jégou. "Billion-scale similarity search with GPUs." arXiv preprint arXiv:1702.08734 (2017).

Issues, Questions & Contact

Please submit all software related bugs and questions using Github issues, for other questions you can contact me at

© 2017 Akshay Bhat, Cornell University.
All rights reserved.