Upload videos or set of images. Download Youtube urls automatically. Browse & annotate uploaded videos. Ability to import pre-indexed datasets.
Perform scene detection, frame extraction on videos. Annotate frames, detections with bounding boxes, labels and metadata.
Extracted objects, along with entire frames and crops, are indexed using deep features. Feature vectors are used for visual search retrieval.
Deploy on variety of machines with/without GPUs, local & cloud. Docker + Kubernetes enables scalable deployment across clouds.
Visual Search as a primary interface
Upload videos, image datasets.
Ingest from various sources such as AWS S3, Youtube.
Pre-trained recognition, detection & OCR models.
Train custom detector models
User Interface for visualization, annotation & monitoring
REST API to simplify development of new front-ends applications
Deep Video Analytics Processing and Query Language for specifying tasks
Videos, frames, indexes, etc. stored in media directory, served through nginx
Perform full-text search on text metadata and names
Configure by specifying environment variables
Manage GPU memory/utilization by dynamically managing workers
Indexing using Google inception V3 trained on Imagenet
Multiple object detectors from TF object detection API
Face detection/alignment/recognition using MTCNN and Facenet
Open Images multi-label inception v3 for text tags
Deep OCR using CTPN & CRNN
Labeled Faces in the Wild
Fine-tune YOLO v2 detector using custom set of regions
Start using trained detector instantly by launching workers that process queue assigned to the detector.
Deep Video analytics is implemented using Docker and works on Mac, Windows and Linux with latest version of Docker & docker-compose installed.
You will need to wait few minutes first time for the images to be pulled and models to be downloaded.
git clone https://github.com/AKSHAYUBHAT/DeepVideoAnalytics cd DeepVideoAnalytics/docs/tutorial && python start_tutorial.py # Follow instructions on the screen # You should be able to use both WebUI # As well as jupyter notebook server with REST API # To stop and delete containers and volumes cd DeepVideoAnalytics/docs/tutorial && python stop_and_delete_volumes.py
You need to have latest version of Docker and docker-compose installed.
You need to have nvidia-docker2 and compatible version of Docker installed.
DVA can be deployed on a Kubernetes cluster with GCS or S3 as media store.
For a quick overview of design choices and vision behind this project we strongly recommend going through following presentation.
Deep Video Analytics uses DVAPQL for processing and querying visual data in a consistent manner. DVAPQL specification & examples can be found here
Schroff, Florian, Dmitry Kalenichenko, and James Philbin. "Facenet: A unified embedding for face recognition and clustering." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
Zhang, Kaipeng, et al. "Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks." IEEE Signal Processing Letters 23.10 (2016): 1499-1503.
Liu, Wei, et al. "SSD: Single shot multibox detector." European Conference on Computer Vision. Springer International Publishing, 2016.
Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.
Johnson, Jeff, Matthijs Douze, and Hervé Jégou. "Billion-scale similarity search with GPUs." arXiv preprint arXiv:1702.08734 (2017).