San Francisco: Tech giant Google has clarified how its Googlebot ranks pages, saying it will crawl the first 15 MB of a webpage and anything after this cutoff will not be included in rankings calculations.
Google specified in the help document that “any resources referenced in the HTML such as images, videos, CSS and JavaScript are fetched separately”.
“After the first 15 MB of the file, Googlebot stops crawling and only considers the first 15 MB of the file for indexing,” Google said.
“The file size limit is applied on the uncompressed data,” it added.
As per the report, this left some in the SEO community wondering if this meant Googlebot would completely disregard text that fell below images at the cutoff in HTML files.
“It is specific to the HTML file itself like it’s written,” John Mueller, Google Search Advocate, clarified via Twitter.
“Embedded resources/content pulled in with IMG tags is not a part of the HTML file,” he added.
To ensure it is weighted by Googlebot, important content must now be included near the top of web pages.
This means code must be structured in a way that puts the SEO-relevant information with the first 15 MB in an HTML or supported text-based file.
It also means images and videos should be compressed not be encoded directly into the HTML, whenever possible.