Coding, Computer Science, English and Text-Based Data, Python, Simon teaching

Encoding and Cracking Codes in Python

Had great fun learning how to crack codes using Python! Simon is currently following the Programming with Python course on Brilliant.org and showed me how to see whether an encrypted piece is gibberish or a real text is hidden behind it.

Simon writes:

A Caesar Shift is a simple cipher, which was a standard in Roman times. It works like this: shift every character by some fixed amount in the alphabet. Something like this:

Example: Suppose some professor writes his name on his board:

ES. TNJUI

It’s encoded with a caesar shift. Because it’s a professor’s name, it probably starts with “Dr.”, so it’s probably a shift that turns D into E, and R into S. So we can work backwards from that shift, and get:

DR. SMITH

That was an easy one, so let’s do something more complex with code.

One of the messages below is a real text, encoded using a Caesar Shift, the other one is just a random sequence of letters. Can you tell which one is which?

Text 1:
yfdpcpoplhhwdpssbjnsqvtlcpzpxqugtjphvgotuvwxufgoqigxwgkskduooyeuoue
fjlnmsqpgxrmcseeliswdheywseqgcbeothskxdzekgxmmkildjnaqbukprpfaaknsu
qpdwayqaqfxsoapvsgreqydqjnkpjghvrkygtidzibhrqkmocukhcunpjcazzvomtsc
fgycwfltmiegaejwcqrgsnxxcbtcrckufwsdxdhbxgppxcuzapbdhftzmugryfseavv
bssqlxanvmfwwzityziixasivzkmvtfczqmdgkabcnjbyhaoealengfptuedlmvryeb
titbwqkekzdpmbtiphdkwwiduassvbgalxgrfhrjrjplxpujrprqzcpcdqsjorigazt
kwwlnwbjryrzhgcttroyemuwwixwufymnknirzmexyowobvardlqktzajzoijwulomg
ztefdpftjealzapcgipgaaspuzxklvd
Text 2:
swodkdbkfovvobpbywkxkxdsaeovkxngrycksndgyfkcdkxndbexuvoccvoqcypcdyx
ocdkxnsxdronocobdxokbdrowyxdrockxnrkvpcexukcrkddobonfsckqovsocgrycop
bygxkxngbsxuvonvszkxncxoobypmyvnmywwkxndovvdrkdsdccmevzdybgovvdrycoz
kccsyxcbokngrsmriodcebfsfocdkwzonyxdrocovspovoccdrsxqcdrorkxndrkdwym
uondrowkxndrorokbddrkdponkxnyxdrozonocdkvdrocogybnckzzokbwixkwoscyji
wkxnskcusxqypusxqcvyyuyxwigybuciowsqrdikxnnoczksbxydrsxqlocsnobowksx
cbyexndronomkiypdrkdmyvycckvgbomulyexnvocckxnlkbodrovyxokxnvofovckxn
ccdbodmrpkbkgki

Simon has explained a way to see whether the encrypted piece contains meaningful (real) text: one can plot the frequency of each letter as it’s used in the encrypted piece. If all letters have generally similar frequency, it’s not a real text, because in real texts, certain letters are encountered much more often than others. Below are the frequency plots Simon made for the texts above, using a Python package called matplotlib:

Frequencies for text 1:

Frequencies for text 2:

As you can see, the second plot depicts a greater variety in frequencies. “For example, o appears the most, but g does not appear that much. And t does not appear at all!” Simon showed me.

As it turned out, we could actually use our knowledge about which letters naturally appear more frequently in English-language texts to crack the code! “Which letter is the most frequent one in English writing?” Simon asked me. “Letter e!” I guessed. “So now we know that the letter o in the encrypted text stands for e in the real text!” Simon exclaimed. “All we have to do to decode it now is simply shift the letters by 10 letters back, because e is 10 letters behind the o!”

Simon Writes:

So, what is the message about? Simon tweaked Brilliant’s code to make sure it shifted by the amount of 10…

imetatravellerfromanantiquelandwhosaidtwovastandtrunklesslegsofstonestandinthedesertnearthemonthesandhalfsunkashatteredvisagelieswhosefrownandwrinkledlipandsneerofcoldcommandtellthatitssculptorwellthosepassionsreadwhichyetsurvivestampedontheselifelessthingsthehandthatmockedthemandtheheartthatfedandonthepedestalthesewordsappearmynameisozymandiaskingofkingslookonmyworksyemightyanddespairnothingbesideremainsroundthedecayofthatcolossalwreckboundlessandbaretheloneandlevelsandsstretchfaraway

…put the spaces and punctuation in appropriately…

I met a traveller from an antique land
Who said: “Two vast and trunkless legs of stone
Stand in the desert . . . Near them, on the sand,
Half sunk, a shattered visage lies, whose frown,
And wrinkled lip, and sneer of cold command,
Tell that its sculptor well those passions read
Which yet survive, stamped on these lifeless things,
The hand that mocked them, and the heart that fed:
And on the pedestal these words appear:
‘My name is Ozymandias, king of kings:
Look on my works, ye Mighty, and despair!’
Nothing beside remains. Round the decay
Of that colossal wreck, boundless and bare
The lone and level sands stretch far away.”

So, it’s about Archeology! This is the poem Ozymandias by Percy Shelley (1818).

Source Code

Encoder / Decoder:

alphabet = "abcdefghijklmnopqrstuvwxyz"

# convert between letters and numbers up to 26
def number_to_letter(i):
    return alphabet[i%26]

def letter_to_number(l):
    return alphabet.find(l)

# How to encode a single character (letter or not)
def caesar_shift_single_character(l, amount):
    i = letter_to_number(l)
    if i == -1: # character not found in alphabet:
        return "" # remove it, it's spaces or punctuation
    else:
        return number_to_letter(i + amount) # Caesar shift

# How to encode a full text
def caesar_shift(text, amount):
    shifted_text = ""
    for char in text.lower(): # also convert uppercase letters to lowercase
        shifted_text += caesar_shift_single_character(char, amount)
    return shifted_text

### MAIN PROGRAM ###

message = """
paste the text here
"""

code = caesar_shift(message, 2)
print(code)

Code for Plots:

import matplotlib.pyplot as plt
alphabet = "abcdefghijklmnopqrstuvwxyz"

code = """
paste the text here
"""

letter_counts = [code.count(l) for l in alphabet]
letter_colors = plt.cm.hsv([0.8*i/max(letter_counts) for i in letter_counts])

plt.bar(range(26), letter_counts, color=letter_colors)
plt.xticks(range(26), alphabet) # letter labels on x-axis
plt.tick_params(axis="x", bottom=False) # no ticks, only labels on x-axis
plt.title("Frequency of each letter")
plt.savefig("output.png")

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s