A developer's guide to understanding Markov chains in AI

A thorough understanding of the theoretical mathematics used in artificial intelligence (AI) will help you transfer to the skills used by AI developers. At least it helps you understand what's going on behind the scenes.

AI uses a lot of mathematics (and terminology derived from it), but most of it is conceptual and not algebraic. Nevertheless, we're not going to dig into it too deeply, but rather just looking at the edges so that we're a little less blindsided by technical whitepapers.

Andrei Andreevich Markov He was a Russian mathematician (and strong chess player) whose work on processes and probability predates modern computing, but has been gratefully exploited ever since.

Simplify every process state and transitionThese are obviously good things for computers, but they're also actually how humans explain stories. We don't try to explain things in real time, we just jump to important events. For example, if we take the story “John walked to the store, went into the bakery and bought bread, went into the deli and bought a sandwich, greeted his friends, left the store and went home.” It can be understood as follows. we.But there is no time information at all, just an order placed series of events.

At any given time, John's condition can be summarized as one of the following:

Travel (to and from the store)
Shopping (buying bread and sandwiches)
chat with friends)

And the evolution can be summarized as follows.

From home to the store and back again.
From one store to another.
From shopping to chatting and back to shopping again.

We created zones for John to come and go. For John, these are all normal everyday things. If a nosy neighbor were to observe several similar trips of John, they would appear random, even if they were made from a small number of choices. John's journey may be described as follows. probabilistic process.

Let's leave John at home for a while. Wikipedia's definition of a Markov chain is: “What is a Markov chain or Markov process? A probabilistic model that describes a sequence of possible events, where the probability of each event depends only on the state achieved in the previous event”

In other words, what happens next depends only on what happened before. Now, if we consider John's journey from the point of view of that nosy neighbor, it would seem that what he does next depends largely on what he was doing at the time. For example, he will meet and chat with his friend only if he is already near the store.

john's journey

Note that although the probability of an option occurring from each state is different, the individual probabilities from each state add up to 100%. Note that John can move from shop to shop, so the transition refers to the state he just finished. And the same goes for chat. To the neighbors, it looks like John only went from the house to the store, so that transition is the only option, and therefore 100%.

By generating a series of random numbers from 1 to 100 and assigning each option appropriately, you can “walk with John,” so to speak. So I asked Claude 3 for help.

Well, from here it's a walk from home.

And one last thing. Mathematicians prefer to transform this type of model into a model like this: matrix. Probabilities are always expressed as decimal numbers between 0 and 1.

of transition matrix is always a square or n-by-n matrix, with size determined by the number of possible states.
The rows represent the current state and the columns represent the next state.
The total probability of each current state (i.e., row) is 1.

So when are Markov chains useful for solving problems? Basically, when you want to model something that is in a discrete state, but you don't understand how it works.

You might think, “But John knows what he's doing, right?” But since we are observing John (presumably from a nosy neighbor), John's actions appear random from an observer's perspective. Mathematics isn't trying to understand anything, it's just a platform for making predictions.

We've seen some of these basics in state machines, but these typically model the state of internal software rather than the actual system.

How to use Markov chains in AI

Markov chains are used as a predictive text design. As more words are input and retrieved by the model, a new set of statistics is appended to the updated Markov chain.

Note that the letters of the alphabet do not change when extra words are added. Some new transitions appear, simply by changing the probability weights. I covered a bit of this when creating a lame Shakespeare generator. What we used was corpus of I looked at Shakespeare's sonnets and tried to calculate some weights.

When you use predictive text in English, you're more likely to look at the current two characters and work with them. A more sophisticated model is obtained by making the probability of choosing each successive letter depend on the previous letter. Therefore, use “tokens” rather than single characters.

Therefore, the Markov model of the order 2 We predict that each character will occur with a certain probability, but that probability may depend on the previous two consecutive characters (k-g).You may have come across this word as well. Ngram. For example, if “th” occurs 100 times in the corpus, “the” occurs 60 times, “thi” occurs 25 times, “tha” occurs 10 times, and “tho” occurs 5 times, the model: Predict. The letter following “th” in a bigram is “e” with probability 0.6, “i” with probability 0.25, “a” with probability 0.1, and “o” with probability 0.05.

That, this, that or that

In the case of sentence completion in the Google search bar, the corpus is search terms from around the world. However, because this corpus is so large, incorrect spellings may also be detected, and the whole system is a little different.

you complete me

If you've done a certain amount of development, you'll feel comfortable with many of these, as linked information chains appear in different guises from time to time. Returning to mathematics, we should see that future AI developments have a less mysterious past.

David is a professional software developer and consultant for Oracle Corp. and British Telecom based in London who helps teams work more agilely. He wrote a book about his UI design and has been writing technical articles ever since.

What's Hot

Maximize your search engine rankings with data-driven tools and local SEO

Revolutionize SEO with AI Onsite Optimizer

What is SEO for websites, YouTube and other digital properties?

A developer's guide to understanding Markov chains in AI

Opinion | Loneliness is a problem that AI cannot solve

Why we need Gemini AI to improve the Assistant on Google's Nest speakers

Two simple rules to guide all your AI efforts

Maximize your search engine rankings with data-driven tools and local SEO

Revolutionize SEO with AI Onsite Optimizer

What is SEO for websites, YouTube and other digital properties?

AI-powered SEO software market [2024-2031] Size, Trends, Sales, Revenue Forecasts HubSpot. Marketo. Oracle – Economica

AMD Ryzen AI CPU beats Intel Core Ultra in AI LLM and GenAI benchmarks, delivers lower power consumption and lower cost with XDNA

Microsoft investigates harmful AI-powered chatbot 'Copilot'

AnkerWork S600 review: An AI-powered speakerphone that actually works

Our Picks

Maximize your search engine rankings with data-driven tools and local SEO

Revolutionize SEO with AI Onsite Optimizer

What is SEO for websites, YouTube and other digital properties?

Most Popular

OnlyFans creator dishes dirt on dating

Anya Taylor-Joy has big plans to rival Gwyneth Paltrow's £197m business Goop as she prepares to launch a lifestyle business

OnlyFans star suffers from online stalking by family member: 'It hurts my stomach'

Subscribe to Updates

What's Hot

A developer's guide to understanding Markov chains in AI

How to use Markov chains in AI

Related Posts