Linguistic Reform
Mar. 10th, 2015 09:33 pmI was reading this article in The Atlantic over the weekend; it insinuates that the reason the US doesn't have more engineers, doctors, teachers, lawyers, etc is because as kids, we spend too much of our time learning to read and write, memorizing and internalizing all of the pits and foibles of the English language.
I wasn't very impressed with their suggestion that spelling reform would instantly transform the US into a powerhouse of capitalism and science, but the idea of using tech to solve the problem appealed to me.
I started thinking about how Japanese children learn kanji by reading furigana, which uses kana (think Japanese alphabet) superscripted over a character to sound it out, allowing them to pick up the meaning from context if they recognize the sound of the word or idea. Of course, we can't really put parenthetical explanations that explain what a word is when it has a complicated or illogical spelling every time we write a word with complicated spelling, but we can use text replacement.
I copied the first paragraph from the wikipedia article on IPA and started writing a correlation guide for sounds. I had to make a few conscious decisions:
I tried this out by manually replacing the text in a short paragraph. But, after reading it, I felt the text was too familiar to me to have the real experience. So I set about automating changing the rest of the article.
First, I used grep to extract all the unique words in the article (which also picked up foreign words, IPA characters, and other non-word contents) to a text file. Then, I opened it in a spreadsheet program and added a column that contained all of the future transformations. I included the first stage transformation for every word that needed one, and the second or third stages for a few that I could see (stage 2 involves removing duplicated consonants). I sorted by column 2 and pasted all of the words that had phase 1 modifications back into a text document and wrote a regex pattern to turn the file into something I could feed into sed. Finally, I used sed to transform the entire article into the phase 1 orthography.
I think the only mental leaps we would need to be able to parse this easily is the knowledge that the digraph "dh" is the voiced "th" (from Northern), and that "c" is always a hard-c sound.
Other changes:
I wasn't very impressed with their suggestion that spelling reform would instantly transform the US into a powerhouse of capitalism and science, but the idea of using tech to solve the problem appealed to me.
I started thinking about how Japanese children learn kanji by reading furigana, which uses kana (think Japanese alphabet) superscripted over a character to sound it out, allowing them to pick up the meaning from context if they recognize the sound of the word or idea. Of course, we can't really put parenthetical explanations that explain what a word is when it has a complicated or illogical spelling every time we write a word with complicated spelling, but we can use text replacement.
I copied the first paragraph from the wikipedia article on IPA and started writing a correlation guide for sounds. I had to make a few conscious decisions:
- I was going to design a phased implementation, where every couple of decades, the next step towards a logical orthography would be taken
- The first phase of the reforms would not involve vowels, nor semi-vowels; they're too damn complicated to make the first changes comprehendible to first-generation converts
- The letter "c" would be used for the "hard-c" sound, instead of "k"--K is commonly used because it is unambiguous in English; however, I think there is a greater stigma to see a large number of Ks in written English, and using C would improve the theoretical uptake
- The letters x and q would not be used; they are however, reserved for possible future purposes. (for example... filling out those damn vowels)
- Morphemes that are used as affixes or in combining forms would use the same spellings to preserve the connection between meanings. This may have the impact that pronunciations of some words would morph.
- E.g., "Northern" would use the same "th" as "North", even though the phoneme is voiced in "Northern" and unvoiced in "north", and my reforms would show the distinction otherwise.
I tried this out by manually replacing the text in a short paragraph. But, after reading it, I felt the text was too familiar to me to have the real experience. So I set about automating changing the rest of the article.
First, I used grep to extract all the unique words in the article (which also picked up foreign words, IPA characters, and other non-word contents) to a text file. Then, I opened it in a spreadsheet program and added a column that contained all of the future transformations. I included the first stage transformation for every word that needed one, and the second or third stages for a few that I could see (stage 2 involves removing duplicated consonants). I sorted by column 2 and pasted all of the words that had phase 1 modifications back into a text document and wrote a regex pattern to turn the file into something I could feed into sed. Finally, I used sed to transform the entire article into the phase 1 orthography.
I think the only mental leaps we would need to be able to parse this easily is the knowledge that the digraph "dh" is the voiced "th" (from Northern), and that "c" is always a hard-c sound.
Other changes:
- TION -> CHION
- SION -> SHION
- Soft G -> J
- "French J" / "voiced SH" to ZH (e.g. closure)
- X to CS or CZ, depending on voicedness
- Soft C to S
- QU to CW or CU
- Ph to F
- "Of" to "Ov"
- "Is" to "Iz"
- Silent letters (G and K of GN, KN) omitted