doctorow at June 2nd, 2014 21:01 — #1
timothy_reeves at June 2nd, 2014 21:40 — #2
Does it fit ETAOINSHRDLU? That's what I remember the frequency of letters in English to be.
starshine at June 2nd, 2014 23:30 — #3
The most surprising thing to me is how skewed towards the end of words e is.
ratel at June 3rd, 2014 01:15 — #4
Interesting, but with English's convoluted history and extreme oral to textual mismatch I doubt it says much linguistically. A phonetic breakdown like this might be something which could then be meaningfully compared to other languages.
catgrin at June 3rd, 2014 02:41 — #5
I work anagrams a lot, and found the graphs to mainly be what I expected. One surprise was "z" - which I quickly figured out was due to "z doubling" toward the end of some common words: dizzy, fizzy, and jazzy are three examples.
It would be a useful tool for someone trying to learn to work anagrams, because the logic of letter placement is something you need to be good at if you're breaking substitution codes. This is basically a cheat sheet.
tsukinokemuri at June 3rd, 2014 06:26 — #6
It doesn't perfectly match ETAOINSHRDLU (which I also remember, most likely from reading Hofstadter). The highest frequency letters here (the ones with charts in the darkest red) are E, O and T; next are A, H, I, N and S; the third group, D, F and R. Compared to ETAOINSHRDLU, A is underrepresented here (or O overrepresented), and F overrepresented. (L and U both appear in the fourth-highest frequency group in the chart.)
I'm a bit surprised (and delighted) that none of the 26 letters have a relatively smooth, flat, balanced graph with roughly equal frequencies for all positions. L probably comes the closest at a quick visual examination, but the bump toward the end of the word is still more than twice as tall as the lowest point. That said, it does seem that the rarest letters (such as J, Q, X and Z) often have very sharp, unbalanced graphs.
catgrin at June 3rd, 2014 08:08 — #7
I think "L" makes sense because it's a common letter and it often appears used in the same way "z" is - as a doubled letter toward the end of a word. For each of those uses, it counts twice. On just this page including comments, I found: will, hopefully, all, especially, linguistically, meaningfully, basically, all, still, tall.
(I'm tired, and may have missed a few)
telecinese at June 3rd, 2014 12:49 — #8
Oh Z, you wildcard. Surprised me as well.
boundegar at June 3rd, 2014 17:24 — #9
ETAOIN SHRDLU comes from the Linotype machine, which was probably designed using a pretty small data set. It's surprisingly close to what you see in actual English, but not precise.
doctorow at June 7th, 2014 21:01 — #10
This topic was automatically closed after 5 days. New replies are no longer allowed.