After having ripped Twin Peaks blu-ray I got a bit tired of identifying the episodes. This got me thinking. Maybe a solution to this would be to audio-fingerprint episodes, put them in a database and then one could use that to identify them, much like Shazam does for music.
I'm a Java developer, unfortunately I didn't really find many good libraries which covered this specific need. I did find a python library called dejavu though. It seems simple enough (although I've never worked with Python before).
So my idea is this.
A python script with 2 modes.
- scan mode: you pass a directory of movies to the script. The script scans the files, creates a fingerprint for the audio for each episode, and stores the show name, season number and episode number (would require a specific naming convention though).
- identify mode: pass it a bunch of files from your last rip. If identified, the file gets renamed.
Not sure how to deal with the database. I'm thinking maybe simply have it local on the users machine, and then host a database dump as a repo on github. You could then have a command to refresh the local copy. Not sure how to deal with user submissions though. Having people open a PR for additions might be a bit complicated.
Anyway. I'm curious if anyone here has been thinking of something similar. If so, I'd love to hear their thoughts. If the idea is completely stupid, and won't work, I'd love to know.
Using Python library dejavu to identify TV Shows
Re: Using Python library dejavu to identify TV Shows
I too am interested in using audio fingerprint to identify TV show, season and episode. After ripping a few hundred DVDs, I realized that the chapter order on the DVD isn't always in episode order. Having a way to identify the actual season and episode would be very helpful. Did you ever make any progress with this?