How does A Search Engine Works
History of Search Engine
The first idea for Search Engine was started in 1945. Vannevar Bush, a great thinker, emphasized the importance of future information through one of his articles “As We May Think” and the need for scientists to design a way to store the information found in Magazines and Books in a device. Gave the idea This was the first idea for the development of Search Engine, he suggested in his article to build a memory device called Memex, which was used to compress and store information so that the information could then be obtained with Speed and Flexibility. The first well-documented Search Engine that discovered Content Files, namely FTP Files, was Archie, which started on 10 September 1990.
The first device or search engine to be used for searching information on the Internet was the “Archie”, designed without the “v” in the archive name. It was designed by Alan Emtage, Bill Heelan and J, students of Computer Science at McGill University in Montreal, Quebec, Canada. One of the first available “All Text” Crawler based Search Engine was Webcrawler, which came out in 1994. Which, unlike earlier predecessors, gave its users the facility to search for any word in any webpage, which has since become standard for all major Search Engines. It was a search engine widely known by the public. Also in 1994, Lycos which started at Carnegie Mellon University was launched and became a major commercially used tool.
Google considered using Search Terms commercially in 1998 from a small search engine company called goto.com. The move had a significant impact on the Search Engine business, which became the most profitable business on the Internet. Around 2000, Google’s Search Engine emerged prominently. The company established an algorithm named Page Rank, as explained in an Anatomy paper of Search Engine written by Google’s later founders Sergey Brin and Larry Page, which produced much better results. Google also maintained a minimalist interface in its Search Engine, which, in fact, became so popular that Google Search Engine became a mystery Seeker like Shine.
How Search Engine Approach
A Search Engine performs tasks in real time according to the following procedures.
1. Web crawling
Search Engine Crawling
Search Engine Indexing
Search Engine Ranking
We can understand the Search Engine process in four steps-
Step 1 (Search)
Search Engine first understands the type keywords that you want and then adds some synonyms in it, which are called “tastiest treats” and all these start your typing. It is in the middle from the finish to the finish. After that Search Engine uses its Algorithms (Mathematical Formulas) to know exactly what you are searching for.
Step 2 (Sort)
Here understand the words you typed and sent to the Srever of Search Engine, where all the information of the world like words, videos, songs, photos and many such information which is scattered on the World Wide Web Which is continuously collected and updated by “Spiders”, Crawl them and select the correct in-formations.
Step 3 (Collect)
Now Search Engine matches all the information received in the form of those indexes with your typed words, then filters all the information which is in the number of thousands which is written in the page. Content collects that data by examining its publication date, how many people have visited it, its reliability and many other such criteria.
Step 4 (Voila!)
Now finally all the information collected by the search engine through Decided Values appears as links on your web page. From where you get your information through those links.
Can search engines find your pages by crawling?
As you have just learned, to make sure that your site is crawled and indexed, and that it is also appearing in the SERPs is very important. If you already have a website, it might be a good idea to start by looking at how many of your pages are in the index. This will give some good information as to whether Google is crawling and finding all those pages.
One way to see your indexed pages is “site: yourdomain.com”, a complicated search operator. Go to Google and sort “site: yourdomain.com” into the search bar. This will show all the crawling results in Google’s index for the specified site:
The number of results that Google displays are not accurate, but it gives you a solid idea of which pages on your site are indexed and how they currently appear in search results.
For more accurate results, monitor and use index coverage reports in Google Search Console. For this you can sign up for a free Google Search Console account. With this tool, you can submit a sitemap for your site and monitor how many submitted pages are actually added to Google’s index.
If you are not appearing anywhere in the search results, there are a few possible reasons:
- Your site is fresh and not yet crawled.
- Your site isn’t linked to any external website.
- Navigation of your site makes it difficult for robots to crawl effectively.
- Your site has some basic code called crawler instructions that are blocking search engines.
- Your site has been penalized by Google for spammy tips.
What are the types of search engines?
Today we can divide the existing search engines into 5 catagories based on how they work –
Crawler Based Search Engines
These are search engines that operate only and only through Computer Programs (also called Spiders, crawlers or bots). These are called crawler based search engines. Example- ask.com
Directory Based Search Engines
These are the search engines in which only websites selected by a team made up of few people are shown, in which none of the websites are shown automatically. Therefore they are called directory based search engines.
Hybrid Search Engines
Those who use search engine crawlers / bots as well as selected things by Human are called hybrid search engines. Such as Google, Yahoo
Meta Search Engines
These are search engines that do not keep millions of websites in their database. Rather, people who search for something in them, then with the help of big search engines like Google and Yahoo, they show that thing to those people by searching it. Examples- DuckDuckGo, DogPile
Specialty Search Engines
These search engines were built to cater to the Demand of a particular area. Such as local program (JustDial), shopping program (Yahoo Shopping).
What is robots.txt file
Robots.txt files are located in the root directory of websites (eg yourdomain.com/robots.txt) and suggest which parts of search engines on your site should be crawled and what not, as well as specific robots Crawls your site with speed through .txt instructions.
How Googlebot deals with robots.txt files, if Googlebot does not find a robots.txt file for a site, it proceeds to crawl that site. If Googlebot finds a robots.txt file for a site, it will usually follow the suggestions and proceed to crawl the site.
If Googlebot receives an error when trying to access a site’s robots.txt file and cannot determine if a site exists, it will not crawl it.
Not all web robots follow robots.txt. People with bad intentions (eg, e-mail address scrapers) create bots that do not follow this protocol. In fact, some bad actors use robots.txt files to find out where you have located your personal content. Although it may seem logical to block crawlers from private pages such as login and administration pages so that they do not appear in the index, keeping the location of those URLs during a publicly accessible robots.txt file also means people with malicious intent you’ll find them more easily. It is better to index these pages and place them behind a login form rather than placing them in your robots.txt file.
Scope of the World Wide Web
According to Google, the size of the World Wide Web is at least 5.58 Billion Pages, which includes More Than 60 Trillion Index, which is more than Neuron in the brain of a human.
Most Trusted Search Engine Other Than Google
Now here we are mentioning some name of search engine which are most trusted and a alternative of google:
- CC Search
- Search Encrypt
- Internet Archive
What is Indian Search Engine
- Epic Search