This article is the last one of a series that is looking into creating your own Social Media Monitoring spreadsheets to spy on your competitors or monitor the health of your clients’ accounts. This specific article is about retrieving the data of Instagram Accounts and automatically putting them in Google Sheets.
We are still going to use the website SocialBlade for the same reasons as before: Code simplicity.
In this tutorial, I want to focus my effort on being able to display the followers/following count, the number of posts made, the engagement rate of the account and the average number of likes and comments for each account. Ready? Let’s do that.
PS: If you’re not interested in learning how to do your own spreadsheet (No judgment), there’s a protected spreadsheet that you can make a copy of at the end of this article so feel free to use it.
Using =IMPORTXML To Extract Web-Page Content
Once again, the formula “IMPORTXML” is going to be the driver of this project. IMPORTXML is basically a web-scraper built-in Google Sheets.
This is how it works:
The way this function works is by taking two parameters: The URL where the data is stored and the exact emplacement inside the URL where the data is located in XPath format.
I will work you through how XPath works for our case but here’s a link to the documentation found on W3Schools if you are interested: https://www.w3schools.com/xml/xpath_syntax.asp
Now let’s take a look at one of my Instagram accounts (@neon_onthe_gram) on Socialblade (Link: https://socialblade.com/instagram/user/neon_onthe_gram)
As stated previously, these are the informations we’re going to extract from the webpage to put in our Google Spreadsheet.
If you “Inspect” (Right-Click -> Inspect) the number “211” that is the number of followers I have, you will get access to these information:
If you have a bit of HTML knowledge, you can figure out that every <div> that has the class “YouTubeuserTopInfo” contains one of the segments of the data we’re trying to extract. Unfortunately, the actual value we are interested in is not stored in an HTML tag that is easy to identify (Note: The number “211” is stored in a <span> tag that does not have any “id” or “class” attribute.). In practicality, it just means that we are going to have to extract the six “div” in their entirety and then split the data to remove the text and only keep the numbers.
The good news is, because the six divs we’re interested in all share the same class, it makes it really easy for us to get this data in a table format like so:
This single formula generated all of the cells displayed in the image above.
Now, the tricky part is to just keep the last column of this data (The column C with the values) and put it in a line instead of a column to be able to sort the values easily once we are going to display.
To do so, we’re going to use a combination of the formula TRANSPOSE and INDEX. Here’s the formula:
The purpose of the INDEX function is to specify that out of the table output generated by the IMPORTXML function, we only want to display every row of the third column.
The TRANSPOSE function is a simple function that inverts the direction of a range of cells so instead of having the output displayed vertically, we can have it displayed horizontally.
Creating the Dynamic Spreadsheet
Now that we have the formula that extracts the data we are looking for, we just have to dynamically create the URLs where the data is stored for every username we are interested in.
The first step to get this going is to get the columns ready like so:
The goal here is to type in the username of an account in
You can copy paste the block below:
="https://socialblade.com/instagram/user/" & A2
This will ensure that by dragging down the cell B2, the URL that will be displayed will always reflect the one of the username in front of it.
Now we need to create the IMPORTXML function so it uses the Cell reference in the B column and can also be dragged down to automatically reflect the URL in front of itself.
And the code here:
Now you can drag down both B2 and C2:
As you can notice, this is not the cleanest looking sheet with all of those #REF errors. The secret here is to add a formula in front of the formula present in B2 and C2:
=if(A2="","","https://socialblade.com/instagram/user/" & A2)
One way to read this one is “If Username cell is empty, show nothing. Otherwise, create the URL based on Username”
And we can do the same with the C column:
Now, that’s how our spreadsheet look like if we drag down the cells B2 and C2:
And that’s pretty much it. here is the link to the Spreadsheet if you just want to make a copy of it for yourself: https://docs.google.com/spreadsheets/d/1mVaCROebN-UXnLwzt5olBA0x6kIFSJuOnrptvyj0Wto/edit?usp=sharing
If you have any question, feel free to leave a reply down below and I will get back to you as soon as I can.
Also consider subscribing to the Newsletter if this topic interested you. I’m in no shortage of articles ideas around this subject and am definitely letting my subscribers know as soon as a new article is posted that could be of any interest to you guys.