This is Simon explaining Diffe-Hellman key exchange (also called DiffeHellman protocol). He first explained the algorithm mixing watercolours (a color representing a key/ number) and then mathematically. The algorithm allows two parties (marked “you” and “your friend” in Simon’s diagram) with no prior knowledge of each other to establish a shared secret key over an insecure channel (a public area or an “eavesdropper”). This key can then be used to encrypt subsequent communications using a symmetric keycipher. Simon calls it “a neat algorithm”). Later the same night, he also gave me a lecture on a similar but more complicated algorithm called the RSA. Simon first learned about this on Computerphile and then also saw a video about the topic on MajorPrep. And here is another MajorPrep video on modular arithmetic.
Had great fun learning how to crack codes using Python! Simon is currently following the Programming with Python course on Brilliant.org and showed me how to see whether an encrypted piece is gibberish or a real text is hidden behind it.
A Caesar Shift is a simple cipher, which was a standard in Roman times. It works like this: shift every character by some fixed amount in the alphabet. Something like this:
Example: Suppose some professor writes his name on his board:
It’s encoded with a caesar shift. Because it’s a professor’s name, it probably starts with “Dr.”, so it’s probably a shift that turns D into E, and R into S. So we can work backwards from that shift, and get:
That was an easy one, so let’s do something more complex with code.
One of the messages below is a real text, encoded using a Caesar Shift, the other one is just a random sequence of letters. Can you tell which one is which?Text 1:
yfdpcpoplhhwdpssbjnsqvtlcpzpxqugtjphvgotuvwxufgoqigxwgkskduooyeuoue fjlnmsqpgxrmcseeliswdheywseqgcbeothskxdzekgxmmkildjnaqbukprpfaaknsu qpdwayqaqfxsoapvsgreqydqjnkpjghvrkygtidzibhrqkmocukhcunpjcazzvomtsc fgycwfltmiegaejwcqrgsnxxcbtcrckufwsdxdhbxgppxcuzapbdhftzmugryfseavv bssqlxanvmfwwzityziixasivzkmvtfczqmdgkabcnjbyhaoealengfptuedlmvryeb titbwqkekzdpmbtiphdkwwiduassvbgalxgrfhrjrjplxpujrprqzcpcdqsjorigazt kwwlnwbjryrzhgcttroyemuwwixwufymnknirzmexyowobvardlqktzajzoijwulomg ztefdpftjealzapcgipgaaspuzxklvdText 2:
swodkdbkfovvobpbywkxkxdsaeovkxngrycksndgyfkcdkxndbexuvoccvoqcypcdyx ocdkxnsxdronocobdxokbdrowyxdrockxnrkvpcexukcrkddobonfsckqovsocgrycop bygxkxngbsxuvonvszkxncxoobypmyvnmywwkxndovvdrkdsdccmevzdybgovvdrycoz kccsyxcbokngrsmriodcebfsfocdkwzonyxdrocovspovoccdrsxqcdrorkxndrkdwym uondrowkxndrorokbddrkdponkxnyxdrozonocdkvdrocogybnckzzokbwixkwoscyji wkxnskcusxqypusxqcvyyuyxwigybuciowsqrdikxnnoczksbxydrsxqlocsnobowksx cbyexndronomkiypdrkdmyvycckvgbomulyexnvocckxnlkbodrovyxokxnvofovckxn ccdbodmrpkbkgki
Simon has explained a way to see whether the encrypted piece contains meaningful (real) text: one can plot the frequency of each letter as it’s used in the encrypted piece. If all letters have generally similar frequency, it’s not a real text, because in real texts, certain letters are encountered much more often than others. Below are the frequency plots Simon made for the texts above, using a Python package called
Frequencies for text 1:
Frequencies for text 2:
As you can see, the second plot depicts a greater variety in frequencies. “For example, o appears the most, but g does not appear that much. And t does not appear at all!” Simon showed me.
As it turned out, we could actually use our knowledge about which letters naturally appear more frequently in English-language texts to crack the code! “Which letter is the most frequent one in English writing?” Simon asked me. “Letter e!” I guessed. “So now we know that the letter o in the encrypted text stands for e in the real text!” Simon exclaimed. “All we have to do to decode it now is simply shift the letters by 10 letters back, because e is 10 letters behind the o!”
So, what is the message about? Simon tweaked Brilliant’s code to make sure it shifted by the amount of 10…
…put the spaces and punctuation in appropriately…
I met a traveller from an antique land
Who said: “Two vast and trunkless legs of stone
Stand in the desert . . . Near them, on the sand,
Half sunk, a shattered visage lies, whose frown,
And wrinkled lip, and sneer of cold command,
Tell that its sculptor well those passions read
Which yet survive, stamped on these lifeless things,
The hand that mocked them, and the heart that fed:
And on the pedestal these words appear:
‘My name is Ozymandias, king of kings:
Look on my works, ye Mighty, and despair!’
Nothing beside remains. Round the decay
Of that colossal wreck, boundless and bare
The lone and level sands stretch far away.”
So, it’s about Archeology! This is the poem Ozymandias by Percy Shelley (1818).
Encoder / Decoder:
alphabet = "abcdefghijklmnopqrstuvwxyz" # convert between letters and numbers up to 26 def number_to_letter(i): return alphabet[i%26] def letter_to_number(l): return alphabet.find(l) # How to encode a single character (letter or not) def caesar_shift_single_character(l, amount): i = letter_to_number(l) if i == -1: # character not found in alphabet: return "" # remove it, it's spaces or punctuation else: return number_to_letter(i + amount) # Caesar shift # How to encode a full text def caesar_shift(text, amount): shifted_text = "" for char in text.lower(): # also convert uppercase letters to lowercase shifted_text += caesar_shift_single_character(char, amount) return shifted_text ### MAIN PROGRAM ### message = """ paste the text here """ code = caesar_shift(message, 2) print(code)
Code for Plots:
import matplotlib.pyplot as plt alphabet = "abcdefghijklmnopqrstuvwxyz" code = """ paste the text here """ letter_counts = [code.count(l) for l in alphabet] letter_colors = plt.cm.hsv([0.8*i/max(letter_counts) for i in letter_counts]) plt.bar(range(26), letter_counts, color=letter_colors) plt.xticks(range(26), alphabet) # letter labels on x-axis plt.tick_params(axis="x", bottom=False) # no ticks, only labels on x-axis plt.title("Frequency of each letter") plt.savefig("output.png")