rubah: (Default)
[personal profile] rubah
I was reading this article in The Atlantic over the weekend; it insinuates that the reason the US doesn't have more engineers, doctors, teachers, lawyers, etc is because as kids, we spend too much of our time learning to read and write, memorizing and internalizing all of the pits and foibles of the English language.

I wasn't very impressed with their suggestion that spelling reform would instantly transform the US into a powerhouse of capitalism and science, but the idea of using tech to solve the problem appealed to me.

I started thinking about how Japanese children learn kanji by reading furigana, which uses kana (think Japanese alphabet) superscripted over a character to sound it out, allowing them to pick up the meaning from context if they recognize the sound of the word or idea. Of course, we can't really put parenthetical explanations that explain what a word is when it has a complicated or illogical spelling every time we write a word with complicated spelling, but we can use text replacement.

I copied the first paragraph from the wikipedia article on IPA and started writing a correlation guide for sounds. I had to make a few conscious decisions:
  1. I was going to design a phased implementation, where every couple of decades, the next step towards a logical orthography would be taken

  2. The first phase of the reforms would not involve vowels, nor semi-vowels; they're too damn complicated to make the first changes comprehendible to first-generation converts

  3. The letter "c" would be used for the "hard-c" sound, instead of "k"--K is commonly used because it is unambiguous in English; however, I think there is a greater stigma to see a large number of Ks in written English, and using C would improve the theoretical uptake

  4. The letters x and q would not be used; they are however, reserved for possible future purposes. (for example... filling out those damn vowels)

  5. Morphemes that are used as affixes or in combining forms would use the same spellings to preserve the connection between meanings. This may have the impact that pronunciations of some words would morph.
    • E.g., "Northern" would use the same "th" as "North", even though the phoneme is voiced in "Northern" and unvoiced in "north", and my reforms would show the distinction otherwise.


I tried this out by manually replacing the text in a short paragraph. But, after reading it, I felt the text was too familiar to me to have the real experience. So I set about automating changing the rest of the article.

First, I used grep to extract all the unique words in the article (which also picked up foreign words, IPA characters, and other non-word contents) to a text file. Then, I opened it in a spreadsheet program and added a column that contained all of the future transformations. I included the first stage transformation for every word that needed one, and the second or third stages for a few that I could see (stage 2 involves removing duplicated consonants). I sorted by column 2 and pasted all of the words that had phase 1 modifications back into a text document and wrote a regex pattern to turn the file into something I could feed into sed. Finally, I used sed to transform the entire article into the phase 1 orthography.

I think the only mental leaps we would need to be able to parse this easily is the knowledge that the digraph "dh" is the voiced "th" (from Northern), and that "c" is always a hard-c sound.

Other changes:
  • TION -> CHION

  • SION -> SHION

  • Soft G -> J

  • "French J" / "voiced SH" to ZH (e.g. closure)
  • X to CS or CZ, depending on voicedness

  • Soft C to S

  • QU to CW or CU

  • Ph to F

  • "Of" to "Ov"

  • "Is" to "Iz"

  • Silent letters (G and K of GN, KN) omitted
From:
Anonymous( )Anonymous This account has disabled anonymous posting.
OpenID( )OpenID You can comment on this post while signed in with an account from many other sites, once you have confirmed your email address. Sign in using OpenID.
User
Account name:
Password:
If you don't have an account you can create one now.
Subject:
HTML doesn't work in the subject.

Message:

 
Notice: This account is set to log the IP addresses of everyone who comments.
Links will be displayed as unclickable URLs to help prevent spam.

Profile

rubah: (Default)
Allison

January 2017

S M T W T F S
1234567
8910 11121314
15161718192021
22232425262728
293031    

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jul. 24th, 2017 04:32 am
Powered by Dreamwidth Studios