What can word frequency tell us about culture?

I've mentioned before that I'm in the process of overhauling my Key Stage 2 scheme of work for Spanish.  So far I've filled in my template for units 1-4 and a bit of 5.

In accordance with the TSC MFL Pedagogy Review and the new Ofsted Framework, I am setting out the language to be learned in each unit under the headings of Vocabulary, Grammar and Phonics.  I am colour-coding the vocabulary to see the spread more easily.

I'm also adding the frequency of each word to ensure that I am focussing on high-frequency vocabulary (see the presentation by Rachel Hawkes about this).

I've purchased the Routledge Frequency Dictionary of Spanish, which lists the top 5000 words as well as giving topic-based lists of words like colours, family words, verbs and so on.  The frequency values of some words have proved interesting.  I think that the ranking of certain words tells us something about Hispanic culture.

The box at the top shows the vocabulary and frequency for my unit 4, which is pencil case and gender.  I was interested to see that bolígrafo/boli, which I have always taught for 'pen', wasn't listed, while pluma comes in at number 2605. 

There then followed an interesting discussion on Twitter about boli vs pluma.  Because here in the UK we teach castellano, we teach bolígrafo/boli as that is commonly used in Spain.  In Central and South America, however, there are a variety of different words, as this infographic shows:

While I've been looking up other words (usually while waiting for Zoom meetings to start!) I've some across some other interesting things which I think tell us about Hispanic culture:
  • Veinte is #819 and treinta is #829.  Veinticinco is #2643 and veinticuatro is #4059.  None of the other 20s feature in the top 5000.  I can see why veinticinco would be quite frequent, but what makes veinticuatro more common than, say veintiuno?
  • Café is much higher than (#961 vs #2552)
  • domingo is much higher on the list that all the other days of the week (lunes #1370; martes #3101; miércoles #1816; jueves #1650; viernes #1259; sábado #1179; domingo #693)
  • julio is the highest month on the list (#659, the next month down is agosto at #931)
  • Perro (#888) is considerably higher than gato (#1728), perhaps reflecting the preferred pet in Spanish-speaking countries (the dictionary uses data from all of them) or indeed the frequency with which those animals appear in phrases and sayings.
  • Hermana (#3409) is much lower than hermano (#333), probably because of the use of hermano(s) to mean sibling(s), but abuela (#783) is much higher than abuelo (#4796).
I'm sure I will unearth some more interesting ones as I complete more units!