Using Python library dejavu to identify TV Shows
Posted: Mon Feb 20, 2023 11:09 am
After having ripped Twin Peaks blu-ray I got a bit tired of identifying the episodes. This got me thinking. Maybe a solution to this would be to audio-fingerprint episodes, put them in a database and then one could use that to identify them, much like Shazam does for music.
I'm a Java developer, unfortunately I didn't really find many good libraries which covered this specific need. I did find a python library called dejavu though. It seems simple enough (although I've never worked with Python before).
So my idea is this.
A python script with 2 modes.
- scan mode: you pass a directory of movies to the script. The script scans the files, creates a fingerprint for the audio for each episode, and stores the show name, season number and episode number (would require a specific naming convention though).
- identify mode: pass it a bunch of files from your last rip. If identified, the file gets renamed.
Not sure how to deal with the database. I'm thinking maybe simply have it local on the users machine, and then host a database dump as a repo on github. You could then have a command to refresh the local copy. Not sure how to deal with user submissions though. Having people open a PR for additions might be a bit complicated.
Anyway. I'm curious if anyone here has been thinking of something similar. If so, I'd love to hear their thoughts. If the idea is completely stupid, and won't work, I'd love to know.
I'm a Java developer, unfortunately I didn't really find many good libraries which covered this specific need. I did find a python library called dejavu though. It seems simple enough (although I've never worked with Python before).
So my idea is this.
A python script with 2 modes.
- scan mode: you pass a directory of movies to the script. The script scans the files, creates a fingerprint for the audio for each episode, and stores the show name, season number and episode number (would require a specific naming convention though).
- identify mode: pass it a bunch of files from your last rip. If identified, the file gets renamed.
Not sure how to deal with the database. I'm thinking maybe simply have it local on the users machine, and then host a database dump as a repo on github. You could then have a command to refresh the local copy. Not sure how to deal with user submissions though. Having people open a PR for additions might be a bit complicated.
Anyway. I'm curious if anyone here has been thinking of something similar. If so, I'd love to hear their thoughts. If the idea is completely stupid, and won't work, I'd love to know.