“How is OVH evolving towards SEO Data Science”? This is the question that Vincent and I answered during our talk at the OVH Summit on October 17, 2017, in Paris. You will find the replay below as well as an article gathering unpublished elements on Data Science in SEO.
Big Bang isn’t the right word, but it’s the first word that comes to mind.
On one side there is the Big (data): a lot of data, more and more, produced by our websites, our servers, our monitoring tools, search engines, APIs…
On the other side, the “Bang! “: The sound of the falling blade of the guillotine, to cut the head of search engine as we know it today. It falls and promises us a radical change, the famous artificial intelligence era.
Many of us are wondering about the evolutions of our field and I would like to talk to you about Data SEO, the future that I would like to give to mine.
Data in SEO
The first SEO tool I used in 2012 was Xenu Link (I’ve since replaced it by Screaming Frog), quickly followed by Excel, Yooda SeeUrank (then AWR Cloud), Piwik (then AT Internet at OVH), Majestic, Webpagetest, Sitespeed.io, Oncrawl, Graylog, 1.fr, Visiblis, SEO Hero, Quantum SEO, Ranxplorer, Ubersuggest, Dataiku (a Data Science tool that I use to cross-analyze logs) and so on.
In addition, there are APIs, and of course search engine tools: Search Console, Trends, Page Speed, Mobile-Friendly Test, Structured Data Testing Tool, Analytics, etc.
It is a fact, facing all this data from separate sources, our mission is threefold:
- Understand data (what does each metric represent, how data is collected and processed, and how to use it?)
- Manipulate and analyse data, sometimes in a completely new way, to output actions to develop traffic / awareness / sales
- Give meaning to data and communicate it to other teams, often helped by data visualisation
Faced tasks above, Data Scientist isn’t the right word, but it’s the first word that comes to mind, as the Search Engine Land website pointed out in its article “10 Reasons Why You, The Search Marketer, Can Call Yourself A Data Scientist.”
Lets go on.
By nature, SEO follows an experimental scientific reasoning that could also be described as empirical: observation > hypothesis > experience > analysis > new hypothesis and so on. A real machine to produce SEO knowledges and fight again Google’s algorithms secrets.
As Olivier Tassel exposed it (FR) at 2017 Paris SEO Camp, I think future of SEO is not empirical but data centric. The massive exploratory processing of data contributes to the production of knowledge, not only experiments do.
Let’s take the reasoning further by looking at Google’s side and in particular its artificial intelligence.
Rankbrain’s influence
Reverse engineering isn’t the right word, but it’s the first word that comes to mind.
Beyond knowing how does Google work (and SEO) every company should today know its own ranking factors, and this is for me one of the stakes of data exploitation for SEO; especially since Google uses RankBrain for every search.
To illustrate the situation, the old Google used “static” code with thousands of conditions to classify pages (the Amit Singhal era). As for the new Google, with Rankbrain, it is a more mathematical and statistical approach that is favored. It is based on artificial intelligence that uses neural networks. We are talking about a branch of machine learning called deep learning as explained in this Wired.com post.
Let’s analyze the situation with the SEO prism
Neural networks use unsupervised learning: they make their own decisions. If an AI is able to control rankings in Google SERP, it seems extremely complicated to know precisely which ranking factors it uses and this can lead to several situations:
- SEO ranking concidered as having very little or no importance for site ranking (HTTPS, facebook shares, W3C validation) could in a given context, for a given query become the most significant factors. Keep in mind that artificial intelligence manages classification itself, according to its own concepts and data.
- We can also imagine that the AI is using new ranking factors that Google engineers would not have thought of, without them realizing it.
- A page could fall in ranking for no apparent reason and the AI could also changing the number of results … The AI could make mistakes and some go unnoticed as explained in Social Media Today.
- Ultra Customization of search results: results based on the user’s personality, mood, knowledges, expenses, income, habits, etc.
Time to upgrade SEO as Google does
Our job is data and KPI driven, but man is still at the heart of the analysis process and exploitation, and I do not even mention the large number of “mechanical” actions (searching for the best keywords, fill in alts, reduce the weight of the images). The least we can say is that SEO is still far enough away from artificial intelligence…
On the other hand, we are at a key stage in the SEO evolution: Big Data technologies have become more accessible (in terms of cost, storage and computing), this allowing data centralization (datalake) and its treatment with technologies like Hadoop, but also the valorization of these data sources via machine learning models application. This is democratized by Data Management Plateforms like Dataiku Data Science Sudio.
With this in mind, I tried to list the missions of a Data Scientist SEO:
- Create, maintain and exploit an SEO datalake gathering all data sources: web analytics, netlinking, semantics, search console, social networks, webperfs, logs, competitors, search trends, events, Business Intelligence data, crawl data, other sources of traffic
- Master the ranking factors of its website(s)
- Improve traffic opportunities research via data science techniques (on this subject, I invite you to consult the article I wrote about my use of the R language for SEO)
- Create an intelligent alerting system and use predictive analysis model to anticipate issues: loss of ranking, loss of traffic, desindexation…
- Improve website architecture and internal linking through genetic algorithms
- Produce semantic analyzes by crossing big query datasets (thanks to text2vec, N-grams, cooccurences, clustering, etc.)
- Process mining (users and bots)
- Automate as possible “mechanical” SEO actions (“There is always an algorithm for this” dixit Sylvain Peyronnet)
- Explore SEO future horizons: automatic text generation, Google Bots behaviours anticipation, etc.
- Act as a bridge between traffic / digital marketing teams and data / BI teams
As Vincent concluded his talk at 2017 TeknSEO (SEO event in Swiss), “now machines can learn and adapt, let’s take advantage of it to create new jobs”. It is up to us to use same technologies as Google to better apprehend it
Sun Tzu isn’t the right reference, but it’s the reference that comes to mind.
Did you liked this article? I am counting on you to share it and follow me on Twitter.
[…] sie tun. Auf SEMRush findet sich eine Ideensammlung, allerdings eher für SEA. Spannend ist noch Remi Bacha, wobei ich von ihm noch keine Daten gesehen habe. Keyword Hero haben was ziemlich Cooles auf die […]
bravo! this is a fantastic post. Thank you for sharing the video as well.
Great post, Remi. Data science is a very important concept and every SEO needs to incorporate it in their strategies. Thanks again for such an informative article.
great , i just wanna read the book!