Ever wondered which languages are used within films? This article will take a look at the Movie Database data set, which contains 45,000 movies with metadata and analyse the languages used in each film.
There are 93 languages that are represented. The English language is, not surprisingly, the overwhelming majority.
Let’s take a look at the top 5 languages used in a raw form:
What about looking at this in graphical form, but with the top 25 languages:
The English language majority makes it a little difficult to read the outcome of this graph. Let’s take out the English language and take a closer look at the outcome.
As you can see, Japanese and Hindi form the majority as far as Asian languages are concerned. #
How does this compare to the spoken languages within a movie?
To do this, I needed to convert the spoken_languages column within the dataset to a numeric value denoting the number of languages that are spoken within a film.
The data shows that most movies just have one spoken language, whilst 19 is the highest number of languages spoken in any film.
- Visions of Europe, 2004, 19 languages
- The Testaments, 2000, 13 languages
- To Each His Own Cinema, 2007, 12 languages
- The Adventures of Picasso, 1978, 10 languages
However, the Visions of Europe is actually a collection of 25 short films together – which explains the high number of languages within the film.
Let’s take a look to see if there is a correlation between the return of a film and the number of languages spoken a film.
with a P-value of 0.19 and speakman r value of 0.018, there’s no correlation between the success of a movie and the number of languages within a film.
In conclusion, there’s no way of connecting success of a film with the languages spoken within the film.
This article doesn’t look at the trend of languages over time and the impact of this or per genre; this could be a future research article.
This post was written by noxford