Coding, English and Text-Based Data, Python, Simon teaching, Simon's Own Code

It takes the sun to the ground, and violet on the observer’s eye.

Simon writes:

This amazing sentence is generated by a Markov Text-Generation Algorithm. What is a Markov Algorithm? Simply put, it generates the rules from a source text, and it generates a new text that also follows those rules. The rules are often called the Markov Blanket, and the new text is also called the Markov Chain. OK, how does this all work?

Let’s take an example: let’s consider the source text to be “Hello, world!”. Then we pick a number called the order. The higher the number, the more sense the text makes. We’ll pick 1 for the first examples, we’ll examine what happens with higher numbers later.

Then we generate the Markov Blanket. This is a deterministic process. We start from the beginning: “H”. So we put H in our Markov Blanket. Then we come across “e”. So we put e in our Markov Blanket, but to specify that it’s next from H, we connect H to e by an arrow. Then we stumble on “l”. So we put l in our Markov Blanket, but again, to specify that it’s next from e, we connect e to l by an arrow.

Now, here’s where it gets interesting. What’s next? Well, it’s “l” again. So now we connect l to itself, by an arrow. This is interesting because it’s no longer a straight line!

And we keep going. Once we’re done, our Markov Blanket will look something like this:

Once we’ve created our Markov Blanket, then we start generating the Markov Chain from it. Unlike the Markov Blanket, generating the Markov Chain is a stochastic process.

This is just a process of wandering about the Markov Blanket, and noting down where we go. One way to do this, is just to start from the beginning, and follow the path. And whenever we come across some sort of fork, we just spin a wheel to see where we go next. For example, here are some possible Markov Chains:

Held!
Helld!
Hellld!
Helorld!
Hello, world!
Helllo, wo, wo, world!

That was an easy one, so let’s do something more complex with code.

First, just an interface to enter in the text, and the order:

text = "" # Variable to hold the text

print("Type your text here (type END to end it):")

while True:
  line = input("") # Read a line of text from standard input
  if line != "END": text += line + "\n" # If we didn't enter END, add that line to the text
  else: break # If we entered END, signify that the text has ended

text = text[:len(text)-1] # Remove the last line break

order = int(input('Type the order (how much it makes sense) here: '))

input("Generate me a beautiful text") # Just to make it dramatic, print this message, and ask the user to hit ENTER to proceed

Next, the Markov Blanket. Here, we store it in a dictionary, and store every possible next letter in a list:

def markov_blanket(text, order):
  result = {} # The Markov Blanket

  for i in range(len(text) - order + 1): # For every n-gram
    ngram = ""
    for off in range(order):
      ngram += text[i+off]
    
    if not ngram in result: # If we didn't see it yet
      result[ngram] = []
    if i < len(text) - order: # If we didn't reach the end
      result[ngram].append(text[i+order]) # Add the next letter as a possibility
  
  return result # Give the result back

Huh? What is this code?

This is what happens when we pick a number >1. Then, instead of making the Markov Blanket for every character, you make it for every couple of characters.

For example, if we pick 2, then we make the Markov Blanket for the 1st and 2nd letter, the 2nd and 3rd, the 3rd and 4th, the 4th and 5th, and so on. When we generate the Markov Chain, we squash the ngrams that we visit together. So next, the Markov Chain:

def markov_chain(blanket):
  keys = blanket.keys()
  ngram = random.choice(list(keys)) # Starting Point
  new_text = ngram
  while True:
    try:
      nxt = random.choice(blanket[ngram]) # Choose a next letter
      new_text += nxt # Add it to the text
      ngram += nxt # Add it to the ngram and remove the 1st character
      ngram = ngram[1:]
    except IndexError: # If we can't choose a next letter, maybe because there is none
      break
  return new_text # Give the result back

# Now that we know how to do the whole thing, do the whole thing!
new_text = markov_chain(markov_blanket(text, order), num)
print(new_text) # Print the new text out

OK, now let’s run this:

Type your text here (type END to end it):
A rainbow is a meteorological phenomenon that is caused by reflection, refraction and dispersion of light in water droplets resulting in a spectrum of light appearing in the sky. It takes the form of a multicoloured circular arc. Rainbows caused by sunlight always appear in the section of sky directly opposite the sun.
Rainbows can be full circles. However, the observer normally sees only an arc formed by illuminated droplets above the ground, and centered on a line from the sun to the observer's eye.
In a primary rainbow, the arc shows red on the outer part and violet on the inner side. This rainbow is caused by light being refracted when entering a droplet of water, then reflected inside on the back of the droplet and refracted again when leaving it.
In a double rainbow, a second arc is seen outside the primary arc, and has the order of its colours reversed, with red on the inner side of the arc. This is caused by the light being reflectedtwice on the inside of the droplet before leaving it.
END
Type the order (how much it makes sense) here: 5
Generate me a beautiful text

And……..it..stops.

Why did it do that?

You see, this is not such a good method. What if our program generated a Markov Blanket that didn’t have an end? Well, our program wouldn’t even get to the end, and it would just wander around and around and around, and never give us a result! Or even if it did, it would be infinite!

So how do we avoid this?

Well, we set another much bigger number , let’s say 5000, to be a callout value. If we don’t get to the end within 5000 steps, we give up and output early. Let’s run this again…

And now, it doesn’t stop anymore! Snippets of example generated text:

It takes the sun to the ground, and violet on the observer’s eye.

This rainbow, a second arc formed by illuminated droplets resulting it.
In a primary rainbow is a meteorological phenomenon the back of the ground, and has the sky. It takes the order of its coloured circles. However, the sun.

Rainbow, a second arc shows red on a line from the section of light in water droplet and has the sun.

In a double rainbow is caused by illuminated droplet on the outer part and refracted when leaving in a spectrum of a multicoloured circles. However, the droplet of water droplets resulting it.
In a double rainbow is a meteorological phenomenon the droplets resulting in a spectrum of a multicoloured circular arc. Rainbow is caused by the inner side the observer’s eye

Play with this project online at: https://repl.it/@simontiger/Markov-Text

Coding, Computer Science, English and Text-Based Data, Python, Simon teaching

Encoding and Cracking Codes in Python

Had great fun learning how to crack codes using Python! Simon is currently following the Programming with Python course on Brilliant.org and showed me how to see whether an encrypted piece is gibberish or a real text is hidden behind it.

Simon writes:

A Caesar Shift is a simple cipher, which was a standard in Roman times. It works like this: shift every character by some fixed amount in the alphabet. Something like this:

Example: Suppose some professor writes his name on his board:

ES. TNJUI

It’s encoded with a caesar shift. Because it’s a professor’s name, it probably starts with “Dr.”, so it’s probably a shift that turns D into E, and R into S. So we can work backwards from that shift, and get:

DR. SMITH

That was an easy one, so let’s do something more complex with code.

One of the messages below is a real text, encoded using a Caesar Shift, the other one is just a random sequence of letters. Can you tell which one is which?

Text 1:
yfdpcpoplhhwdpssbjnsqvtlcpzpxqugtjphvgotuvwxufgoqigxwgkskduooyeuoue
fjlnmsqpgxrmcseeliswdheywseqgcbeothskxdzekgxmmkildjnaqbukprpfaaknsu
qpdwayqaqfxsoapvsgreqydqjnkpjghvrkygtidzibhrqkmocukhcunpjcazzvomtsc
fgycwfltmiegaejwcqrgsnxxcbtcrckufwsdxdhbxgppxcuzapbdhftzmugryfseavv
bssqlxanvmfwwzityziixasivzkmvtfczqmdgkabcnjbyhaoealengfptuedlmvryeb
titbwqkekzdpmbtiphdkwwiduassvbgalxgrfhrjrjplxpujrprqzcpcdqsjorigazt
kwwlnwbjryrzhgcttroyemuwwixwufymnknirzmexyowobvardlqktzajzoijwulomg
ztefdpftjealzapcgipgaaspuzxklvd
Text 2:
swodkdbkfovvobpbywkxkxdsaeovkxngrycksndgyfkcdkxndbexuvoccvoqcypcdyx
ocdkxnsxdronocobdxokbdrowyxdrockxnrkvpcexukcrkddobonfsckqovsocgrycop
bygxkxngbsxuvonvszkxncxoobypmyvnmywwkxndovvdrkdsdccmevzdybgovvdrycoz
kccsyxcbokngrsmriodcebfsfocdkwzonyxdrocovspovoccdrsxqcdrorkxndrkdwym
uondrowkxndrorokbddrkdponkxnyxdrozonocdkvdrocogybnckzzokbwixkwoscyji
wkxnskcusxqypusxqcvyyuyxwigybuciowsqrdikxnnoczksbxydrsxqlocsnobowksx
cbyexndronomkiypdrkdmyvycckvgbomulyexnvocckxnlkbodrovyxokxnvofovckxn
ccdbodmrpkbkgki

Simon has explained a way to see whether the encrypted piece contains meaningful (real) text: one can plot the frequency of each letter as it’s used in the encrypted piece. If all letters have generally similar frequency, it’s not a real text, because in real texts, certain letters are encountered much more often than others. Below are the frequency plots Simon made for the texts above, using a Python package called matplotlib:

Frequencies for text 1:

Frequencies for text 2:

As you can see, the second plot depicts a greater variety in frequencies. “For example, o appears the most, but g does not appear that much. And t does not appear at all!” Simon showed me.

As it turned out, we could actually use our knowledge about which letters naturally appear more frequently in English-language texts to crack the code! “Which letter is the most frequent one in English writing?” Simon asked me. “Letter e!” I guessed. “So now we know that the letter o in the encrypted text stands for e in the real text!” Simon exclaimed. “All we have to do to decode it now is simply shift the letters by 10 letters back, because e is 10 letters behind the o!”

Simon Writes:

So, what is the message about? Simon tweaked Brilliant’s code to make sure it shifted by the amount of 10…

imetatravellerfromanantiquelandwhosaidtwovastandtrunklesslegsofstonestandinthedesertnearthemonthesandhalfsunkashatteredvisagelieswhosefrownandwrinkledlipandsneerofcoldcommandtellthatitssculptorwellthosepassionsreadwhichyetsurvivestampedontheselifelessthingsthehandthatmockedthemandtheheartthatfedandonthepedestalthesewordsappearmynameisozymandiaskingofkingslookonmyworksyemightyanddespairnothingbesideremainsroundthedecayofthatcolossalwreckboundlessandbaretheloneandlevelsandsstretchfaraway

…put the spaces and punctuation in appropriately…

I met a traveller from an antique land
Who said: “Two vast and trunkless legs of stone
Stand in the desert . . . Near them, on the sand,
Half sunk, a shattered visage lies, whose frown,
And wrinkled lip, and sneer of cold command,
Tell that its sculptor well those passions read
Which yet survive, stamped on these lifeless things,
The hand that mocked them, and the heart that fed:
And on the pedestal these words appear:
‘My name is Ozymandias, king of kings:
Look on my works, ye Mighty, and despair!’
Nothing beside remains. Round the decay
Of that colossal wreck, boundless and bare
The lone and level sands stretch far away.”

So, it’s about Archeology! This is the poem Ozymandias by Percy Shelley (1818).

Source Code

Encoder / Decoder:

alphabet = "abcdefghijklmnopqrstuvwxyz"

# convert between letters and numbers up to 26
def number_to_letter(i):
    return alphabet[i%26]

def letter_to_number(l):
    return alphabet.find(l)

# How to encode a single character (letter or not)
def caesar_shift_single_character(l, amount):
    i = letter_to_number(l)
    if i == -1: # character not found in alphabet:
        return "" # remove it, it's spaces or punctuation
    else:
        return number_to_letter(i + amount) # Caesar shift

# How to encode a full text
def caesar_shift(text, amount):
    shifted_text = ""
    for char in text.lower(): # also convert uppercase letters to lowercase
        shifted_text += caesar_shift_single_character(char, amount)
    return shifted_text

### MAIN PROGRAM ###

message = """
paste the text here
"""

code = caesar_shift(message, 2)
print(code)

Code for Plots:

import matplotlib.pyplot as plt
alphabet = "abcdefghijklmnopqrstuvwxyz"

code = """
paste the text here
"""

letter_counts = [code.count(l) for l in alphabet]
letter_colors = plt.cm.hsv([0.8*i/max(letter_counts) for i in letter_counts])

plt.bar(range(26), letter_counts, color=letter_colors)
plt.xticks(range(26), alphabet) # letter labels on x-axis
plt.tick_params(axis="x", bottom=False) # no ticks, only labels on x-axis
plt.title("Frequency of each letter")
plt.savefig("output.png")
English and Text-Based Data, Laws and cultural differences, Milestones, Notes on everyday life, Philosophy, Set the beautiful mind free

What are exams good for?

“I can see that your son has native speaker skills, but we still cannot give him a passing grade”, the English examinator told me in an apologetic tone of voice. She and her colleague at the Brussels examination committee had just finished their assessment of Simon’s oral English and brought Simon, a whole storm of emotions on his face, back to me in the waiting room.

“We were just wondering, does he speak Dutch? We weren’t sure he understood the tasks and they were written in Dutch”, — the examinator was sympathetic of Simon’s young age as most of the other kids taking the same test were about 6 years his seniors. As it turned out, the first task was to describe several photos of “criminals” (one of them with many piercings), the second task involved choosing two things that Simon would like to do from a list of recreational activities (the list included an escape room and a Stonehenge trip). “I just didn’t know what to say!” Simon was catching his breath in between the sobs. “It’s an impossible question, because I had to choose two things I like from a list where there wasn’t anything I liked!” The examinator suggested Simon could have said why he disliked those things. “If you don’t find something interesting you just don’t find it interesting, it’s a given fact! You can’t explain it!” – he told her in English.

Another fact is that Simon wouldn’t be able to perform these tasks in any of the three languages he speaks. Not because his vocabulary or grammar don’t stretch that far. I often hear him construct amazingly intact sentences, which I immediately record, like this one recently: “This is incredible! We’ve found a connection between a discrete problem, of what’s the smallest number that divides all of the numbers in a given sequence, to a continuous problem, of what is the fundamental frequency of a combination of sine waves. In other words, we found a discrete solution to a continuous problem!” Simon loves deep philosophical or scientific questions, but often cannot answer open questions lacking substance. He doesn’t care if you ask him to describe someone’s looks on a picture, it’s not important to him. He doesn’t know how to pick two things he likes from a list of things he doesn’t like. It’s just the way his mathematical brain is wired.

“Can I send you one of the many videos on Simon’s YouTube channel as an alternative proof of his excellent oral English skills?” I asked, still shocked at the absurdity of the situation. “Because I dare to say Simon speaks English better than any other student you have examined today”. The examinator agreed that I was probably right in my judgement but couldn’t accept anything else but a completed exam task.

Although distressed about what Simon had to go through, I can’t help feeling content with today’s scoop. What can provide a more obvious proof that exams don’t do a good job measuring one’s skill than this example of a 9 year old who gives hour-long science lessons on YouTube, speaks at grown-up creative coding meet-ups and is often mistaken for a native speaker, but doesn’t pass his oral English exam because he’s being asked questions that don’t interest him?

It wasn’t Simon who failed today, it was the exam that failed to measure his English. And this raises a whole lot of questions. Why is this system of measurements, that clearly doesn’t work for everyone, has become decisive in how our society views someone’s ability? And what is the use of spending so much money and nervous cells on something that doesn’t work?

Wouldn’t it be more fair towards both the students and anyone who honestly wants to know their level to actually look at what they can do with their knowledge in real life (their actual projects, videos of their social engagement) instead of the fake setting at the exam? Wouldn’t it be wiser to observe a student’s gradual progress in a given area, instead of stressing the students out and giving them the impression that it’s all about the examinator checking off that box and they can forget what they have learned the next day, because all that matters in our society is the passing grade?

“I’m so neutral about this”, Simon told me (in English) when he was lying in bed the same evening. “Because on the one hand, I kind of feel bad. And on the other hand, it’s so beautiful how we sort of accidentally taught them how exams can show false negatives or false positives. Because the exam showed a false negative. Even the examinators know it’s a false negative”.

Simon on his way to Brussels today
Coding, English and Text-Based Data, JavaScript, Milestones, Simon's Own Code

Bookmarklet

Entering a new domain! Making specifically this bookmarklet (delete bookmarklet) was Simon’s idea. He learned to make bookmarklets today during Daniel Shiffman’s live session on basic bookmarklets and Chrome extensions. The video below is basically only watchable in the beginning and the end (Simon filmed himself debugging in the middle, feel free to skip that 🙂

Simon is now working on a Chrome extension that would do the same as the bookmarklet he made – delete words. He says that a Chrome extension is more sophisticated and involves more code. He is currently halfway through. the picture below shows Simon and his giant Chrome extension button:

DSC_2823

Screenshot of the browser:

Chrome extension 3 11 Nov 2017

Excerpt from Simon’s conversation with his friend programmers in Slack today:

Chrome extension 5 11 Nov 2017Chrome extension 6 11 Nov 2017

Coding, English and Text-Based Data, JavaScript, Milestones, Simon teaching, Simon's Own Code

Announcing Simon’s First Live Stream!

Simon is planning to do his fist coding live stream (online lesson he is going to teach to his audience on YouTube) on Thursday, 2 November at 5 p.m. CET. Topic: Simon’s own speech recognition library Speechjs (in JavaScript). You can view the stream live at our channel on YouTube, please subscribe at:

https://www.youtube.com/channel/UCetWQQLtesZsA0xuqJBkoZw

Simon has been preparing for days, teaching himself how to use Open Broadcast Studio and making a presentation in Google Docs with over 20 slides (see some of them below). He made the presentation all by himself; when I read it through this afternoon I only found one spelling mistake.

Presentation First Livestream 1 Nov 2017 2Presentation First Livestream 1 Nov 2017 3Presentation First Livestream 1 Nov 2017 4Presentation First Livestream 1 Nov 2017 5Presentation First Livestream 1 Nov 2017 6Presentation First Livestream 1 Nov 2017Presentation First Livestream 1 Nov 2017 1

Coding, Community Projects, Contributing, English and Text-Based Data, JavaScript, Milestones, Simon teaching, Simon's Own Code

Simon made his own speech library: Speechjs

Simon has just finished working on his first library,  a #speechlibrary Speechjs. You can find Simon’s library on GitHub: https://github.com/simon-tiger/speechjs

Simon also added a reference page at: https://github.com/simon-tiger/speechjs/wiki/Reference

You can use this library for any project that uses #speechrecognition and/or speech synthesis. Simon has put it under the MIT (permissive) license, to make sure everyone can use it for free, he emphasized.

While writing the library, Simon also recycled various code he found online, but essentially this library is his own code. He calls the library “just a layer on top of the web speech API” (that means you’re limited to what your browser supports).

 

Coding, English and Text-Based Data

Computer repeats after Simon

Following the exciting text-to-speech and speech-to-text projects yesterday, this morning Simon made a basic speech-to-text-to-speech demo, which means that the computer can now repeat (parrot) everything Simon says.

Simon relied on what he learned during Daniel Shiffman’s two latest live streams on the Coding Train channel in building these projects.

Coding, English and Text-Based Data, JavaScript, Milestones

Almost talking to the computer!

This is one of those wow projects, so much fun! Simon built his Text-to-Speech and Speech-to-Text demos following Daniel Shiffman’s recent live streams on working with the p5.Speech library and added some extra style features. This basically means that you can type anything on your computer and hear it say what you’ve typed (in any voice or language!) or, in what Simon said was an easier project, yell something to your computer (I love you!) and watch it type it out for you. The next step will be combining the two and including that code into a chat bot code.

You can play with Simon’s Text-to-Speech demo on GitHub at:

Basic text to speech example: https://simon-tiger.github.io/p5_speech/01_text2speech/

Example using different voices: https://simon-tiger.github.io/p5_speech/02_voices/

Basic speech to text example: https://simon-tiger.github.io/p5_speech/03_speech2text/

Code/ repo: https://github.com/simon-tiger/p5_speech

 

 

 

 

 

API, Coding, English and Text-Based Data, JavaScript, Milestones, Server Side Programming

Simon’s Spellcheck API

Simon has continued with server side programming and made a spellcheck API! Here is the link, you can play with it yourself by adding new words to the corpus (dictionary):

https://spellcheck-api.herokuapp.com/

Here is how the API works:

And the making of, step by step:

 

 

 

 

 

 

 

The project is partially based on what Simon learned from Daniel Shiffman’s tutorials about creating web servers and the materials available online in Daniel Shiffman’s Programming A to Z course (analyzing and generating text-based data) and is partially Simon’s own code.

Coding, CSS, English and Text-Based Data, JavaScript, Simon makes gamez

Wikipedia Crawler

Simon has made his version of Daniel Shiffman’s Wikipedia Crawler, graphing the relatedness between Wikipedia articles.

Play with it yourself online at: https://simon-tiger.github.io/wikipedia-crawler/wikipedia/

Code: https://github.com/simon-tiger/wikipedia-crawler/

Simon writes:

How it Works

Enter a query (e. g. apple) and either hit Enter or press the button “Query the API”. If an article called “Apple” exists, a circle will pop up with th word “Apple” in it. If an article called “Apple” doesn’t exist, a circle with something alse will pop up. Click the circle (or article) to reveal its related articles. As you might expect, you can click any of those articles to reveal its related articles.

Inspiration

The inspiration comes from Daniel Shiffman and its Coding Train. Link to Daniel’s version here.