Sarah Palin vs. 12 lines of JavaScript

Imagine This Scenario

Donald Trump is campaigning through your town and on the last day of his visit, you decide you want to endorse him for president. You need to give your endorsement speech that evening, and you want it to be as powerful and compelling as the recent endorsement speech by right-wing darling Sarah Palin.

Do you?

Spend the 4 or 5 hours you have desperately trying to pen a speech that will cut the muster?
Check yourself into the local psychiatric hospital for even considering supporting Trump?
Use a Markov Text Generator trained with Sarah's recent endorsement speech to brew you the perfect presentation?

Although option #2 is probably the best choice, lets go for option #3 as it is a lot more fun and feeds the topic of this article - programmatically creating Sarah-speak!

First, The Results

This is just a fun post, so lets start with the results - here you are - your speech is ready (and simply click the button to generate another):

If you get a particularly thought provoking passage, please leave it in a comment below.

How Its Done

If you are unfamiliar with Markov Chains, you may be wondering how this was done. Is this some form of AI?

No, it is not AI - for a Palin speech that would be overkill. This is using a very simple mechanism known as a Markov Text Generator.

It works by taking a piece of source text – referred to as the "input corpus". It then generates semi-random text in which each word in the new text is chosen as a function of the word that precedes it. This function is simply based on the probabilities of each word pair in the source text.

An Example

Suppose we take the following corpus text – something at least as relevant to our times as the Sarah Palin endorsement speech.

How much wood could a woodchuck chuck if a woodchuck could chuck wood?

If the word "woodchuck" was the first randomly selected word, the next word would be chosen at random from the two possibilities: "chuck" and "could", each with a 50% probability. If "chuck" were selected, then the next word would be either "if" or "wood".

This results in text that contains the same words and word pairs as the corpus text in roughly the same proportion. The process is "stateless" - so no topics are maintained or ideas woven together. The resulting text often contains a lot of non-sequiturs and disjointed run-on sentences. Exactly like Sarah Palin!

In fact, it was while reading bits of her endorsement speech that it struck me just how closely her sentence patterns sounded like Markov generated text. I couldn't resist coding up a quick generator to compare the two.

Can You Tell Sarah Palin From Random Text?

Lets try this - here are some sentences. Try and guess which are actually part of Sarah's speech and which are auto-generated (then click each one to find out):

How about the rest of us? Right-winging, bitter-clinging, proud clingers of our guns, our God, and our religion, and our Constitution.

So, our friends here from Alaska, lending our guns, our god, and our support for the next president of the master at the art of the deal. And we love your freedom, thank a new commander-in-chief who will allow you to make America great again.

You only go to war if you’re determined to win the war! Those of you, like me, a member of the GOP, this is what we have been going on for centuries. I’m still standing. And say, Thank you, enemy.

Well, and then, funny, ha ha, not funny, but now, what they’re doing is wailing, ‘Well, Trump and his Trumpeters, they’re not conservative enough.’

Trump. His life looking up and respecting the hard-hats and the steel-toed boots and the work ethic and we love our freedom.

And bequeathing our children millions in new debt, and refusing to fight back there in the press box. That shining, towering, Trump tower. I’m in it to win it because he, as he builds things, he builds big things, things that touch the sky, big cities and her supporters? He is not elitist at all.

They stomp on our neck, and then they tell us, ‘Just chill, O.K., just relax.’ Well, look, we are mad, and we’ve been had. They need to get used to it.

And he, who would negotiate deals, kind of with the skills of a community organizer maybe organizing a neighborhood tea, well, he deciding that, ‘No, America would apologize as part of the deal,’ as the enemy sends a message to the rest of the world that they capture and we kowtow, and we apologize, and then, we bend over and say, ‘Thank you, enemy.’

Improvements to the Standard Algorithm

If you've played with Markov text generators before, you may have noticed that the text generated here reads better than expected. This is because I made a couple improvements.

First, I generate the text a sentence at a time: Choosing a random starter word from the corpus text, then following the Markov algorithm until it hits a sentence ender word. These starter and ender words are simply based on whether they started or ended a sentence in the corpus text. This gives the generated text more of a sentence-like structure which reads better.

Additionally, rather than always picking a single following word at a time, I had it pick a random small number of following words, which results in small groups of words repeated just as they were in the corpus text. This does run the risk of repeating the source text too closely, but through testing and adjusting those parameters I was able to strike a good balance.

Ok, The Code

The code for this is really simple. First, we split the corpus text into words and make a separate list of starter words. Then, for each sentence, we grab a random starter word and then complete the sentence by repeatedly using the Markov process to obtain the next word. This finishSentence() function is the heart of the process, and is only about a dozen or so lines long:

// pass in sent array containing first word at element 0
function finishSentence(sent)
{
	while(!isSentenceEnd(arrayLast(sent)))
		sent = sent.concat(getRandNextWordArray( arrayLast(sent),rand(1,7)))

	return sent.join(" ")
}

function getRandNextWordArray(w,wc)
{
	// lc (ListComprehension) is a combination of map and filter
	// here we create our NextWordArray - any duplicates intentionally included
	var nwa = lc(function(ww,i) {
			if(ww == w)
				return i
		},words)

	var result = []
	var index = nwa[rand(nwa.length)] // choose a random next word index

	// continue grabbing words up to wc or until we finish the sentence
	while(wc--)
	{
		nwo = words[index++]
		result.push(nwo)
		if(isSentenceEnd(nwo))
			return result
	}

	return result
}

See this code on Github for a complete run-able code example (run from Node or a browser console).

Further Improvements

There are certainly additional techniques that would improve the "quality" of this text generator even further.

One possible improvement is to try and carry a "thread" or "topic" from one sentence to the next. This would require starting in the middle of a sentence and building out the beginning and the end using Markov to finish it and a reverse Markov to start it. Topics could either be randomly chosen words, words chosen using some English cues (i.e. words that follow "the"), or could be fed to the generator manually.

Of course any Markov text generator is constrained by the quality of the corpus text (which in the case of Sarah Palin, one could argue, sets a fairly low bar.)

If any readers have ideas that could improve the results, please leave a comment below.

Comments

Keith M.

February 26, 2016

She says "Yeah, I’ll go, send me, you betcha. It’s part of the war!" - if only...

Owen

March 8, 2016

Super cool! However, finishSentence() is 3 lines and getRandNextWordArray() is about 12 lines. Also, the title suggests "12 lines of javascript", which is a bit misleading.

glenn

March 9, 2016

Hey, if you are going to bring up "facts" and hold me to them, you haven't learned enough from "The Sarah". Go back up the page and generate a few more speeches until it sinks in. ;-)

Alison

October 1, 2017

neat article- do you have any ideas as to how one would avoid any/all direct repeats of the text? I am new to JavaScript, but it seems like there would be a way to filter results like that out

(Comments currently disabled)