identify-media

# Analyse Media File This library is written to help streamline getting information about, and subtitles for, backups of DVDs and BluRays. Using only string methods, in order to be compatible with both browser and desktop applications, to build search query data for various media API backends. The library is written in TypeScript and includes types easily converted to TMDB, OMDB and IMDB search queries as well as media hash coding compatible with OpenSubtitles API. ## Analysing Functions `makeHash` returns a promise of a 16 length hex string that can be used in various media APIs to search for content about a media file. Most commonly it is used to find subtitles via https://opensubtitles.org when your DVD backup didn't have subtitles included in your preferred language. `analyseFilePath` returns a structured output of included type `AnalysedMedia` which is a `Union` of `AnalysedMovie`, `AnalysedTVShow` or a `string`. The `string` will include just the guessed at title if it was unable to identify the name and path formatting from your backup program. `AnalysedMovie` is structured as follows: export interface AnalysedMovie { type: 'movie'; year?: number; name: string; } name will be the guessed at name, based on common patterns from backup programs, and year will be the guessed at release year. `AnalysedTVShow` is structured as follows: export interface AnalysedTVShow { type: 'tv'; name: string; season?: number; episodes?: number[]; year?: number; } name will be the guessed at name, based on common patterns from backup programs, season and episodes will be the guessed at season and episodes, multiple episodes encoded in the same file will come as multiple entries in episodes array, year will be the year of the first air date. ## Usage This library will work in browser or in node and depends on boilerplate code to read file content and file path. ### Browser Using FileWithPath (eg. from React-dropzone) getting search data is done as: const [analysed, setAnalysed] = useState<AnalysedMedia[]>([]); const onDrop = useCallback((files) => { setAnalysed(files.map((file) => analyseFilePath(file.path))); }, []); const {...} = useDropzone({onDrop, accept: 'video/*'}); ... Getting the Media Hash takes a promise based FileReader wrapper, like: const HASH_CHUNK_SIZE = 65536; //64 * 1024 - MediaHash defined const [hashes, setHashes] = useState<string[]>([]); //Simple promise wrapper for FileReader const readBlock = useCallback((file: File, block: number): Promise<string> => { return new Promise<string>((resolve, reject): void => { const reader = new FileReader(); reader.onload = (event) => { if (event.target !== null) { resolve(event.target.result as string); } else { reject(event); } }; reader.onerror = (error) => { reject(error); } if (block < 0) { reader.readAsBinaryString(file.slice(block)); } else { reader.readAsBinaryString(file.slice(0, block)); } }); }); const onDrop = useCallback((files) => { //makeHash uses file size and the first and last 64K chunk of the file. Promise.all(files.map((file) => makeHash(file.size, readBlock(file, HASH_CHUNK_SIZE), readBlock(file, -HASH_CHUNK_SIZE))) .then(setHashes) }, []); const {...} = useDropzone({onDrop, accept: 'video/*'}); ... Aside from these functions there are a couple of helper functions included: `isAnalysedMovie` and `isAnalysedTVShow` identity wrappers, and `isExtras` and `isSample` will tell if the file is likely a sample or an extra, finally `isSameRelease` will tell if two instances of AnalysedMedia are likely the same movie or TVShow (excluding season and episode from comparison). These methods can help cut down on the number of requests to APIs by grouping requests for TVShows and only getting individual episodes as needed, and not requesting subtitles for sample files, etc. ## API Functions API functions to search TMDB, OMDB and OpenSubtitles. The functions are seperated into mapper functions that map `AnalysedMedia` into queries for these APIs, and search, get and find methods that return Axios compatible request config objects. Added mapper functions to map Tmdb and Omdb results into a subset of data called `Media` as a union of `Movie` and `TVShow` types: export interface MediaInfo { plot?: string; images: Record<string, string>; } export interface Movie { type: 'movie'; imdbId?: string; tmdbId?: number; title: string; release?: string; mediaInfo: MediaInfo; } export interface TVShow { type: 'tv'; imdbId?: string; tmdbId?: number; name: string; firstAirDate?: string; mediaInfo: MediaInfo; } To use these AnalysedMedia objects can be mapped to search queries and used with the search functions to create an Axios compatible request object, like this: Axios.request(searchTmdb(mapTmdbQuery(media), this.apiKey)) .then((response) => response.data) This will return a TmdbSearchResponse object of the following kind: export interface TmdbSearchResponse { page: number; total_results: number; total_pages: number; results: Array<TmdbMovieResult|TmdbTVShowResult>; } The release and firstAirDate fields are formatted as `yyyy-mm-dd`. There is a specialised `mergeMedia` function that takes two of these and return the combined result. This can be used to fetch both from Tmdb and Omdb and merge into a single object. ## Future development Further methods would include looking for Meta-data inside files as some backup programs put useful information there, including looking for already existing subtitles before inquiring https://opensubtitles.org