MongoDB-TV

GridFS + Full text search

Created by Matias Cascallares

GridFS Overview

Specification to store files inside MongoDB

Nothing new!

  • fs.files
  • fs.chunks
  • fs.files

    
    
    var file = {
    	_id : ObjectId("52669591c3d732d92d000002"),
    	filename : "52668b8924319b2524ed86ab_1_1",
    	contentType : "binary/octet-stream",
    	length : 160722190,
    	chunkSize : 8388608,
    	uploadDate : ISODate("2013-10-22T15:11:22.138Z"),
    	aliases : null,
    	metadata : {
    		contentType : "video/quicktime"
    	},
    	md5 : "85217913f05c231b1eac9e1c1492971f"
    }
    
    						

    fs.chunks

    
    
    var file = {
    	_id : ObjectId("52669591c3d732d92d000003"),
    	n:0,
    	files_id: ObjectId("52669591c3d732d92d000002"),
    	data: '' //binary
    }
    
    						

    The Idea

    • A nodejs web application
    • Store any video into MongoDB using GridFS
    • Stream them to the browser
    • Store subtitles to be able to search into them using Full Text Search

    Schema

  • Video file -> GridFS
  • Show and episode metadata -> Document
  • Subtitles -> Document
  • Show and episode metadata

    
    var show = {
        _id: ObjectId("528335aecfc1b246d7686c94");
        name: "M102 for DBAs",
        episodes: [
            {
                _id: ObjetId("528335aecfc1b246d7686c95"),
                created: Date(),
                season: 4,
                number: 3,
                video: "filename1.mp4", // filename in GridFS
            },
            // ...
        ]
    };
    						

    Subtitle metadata

    
    var subtitle = {
        _id: ObjectId("528335aecfc1b246d7686c96");
        episode: ObjectId('7834634dca45'), // refers video in show.episode._id
        start: 1234,
        end: 1250,
        text: 'A replica set in MongoDB is a group of mongod processes...'
    };
    
    subtitleSchema.index({text : 'text'});
    
    						

    Ok... show me!

    Some considerations

    Enable text search

    Text search is not enabled by default in 2.4.x, you have to force it

    
    mongod --dbpath data --setParameter textSearchEnabled=true --fork
    						

    GridFS chunk size

    • 256K by default: 300 MB file -> 1200 docs
    • Increased to 8M: 300 MB file -> 38 docs

    HTTP is your friend

    Use GridFS information to provide to the client all kind of mechanisms for caching, expiration and validation to reduce database load

    HTTP is your friend

    
    if (metadata && stream) {
        res.status(200);
        res.set({
            'Accept-Ranges': 'bytes',
            'Content-Type': metadata.contentType,
            'Content-Length': metadata.length,
            'Etag': metadata.md5,
            'Last-Modified': metadata.uploadDate
        });
        stream.pipe(res);
    } else {
        res.status(404).send('Not found');
    }
        					

    Performance

    Meny Meny

    Source code

    https://github.com/mcascallares/MongoDB-TV

    DISCLAIMER: This is a hackathon-style project, is not the best code you can find out there

    with a little help from my friends

    THE END

    Thanks