Modern methods of information search. Inductive method of information retrieval. The process of cognition with the use of induction is made from single judgments and facts to general rules and generalizations, in which the general pattern is expressed. search

Video tutorial: How to create queries in Microsoft Access in 10 minutes

Lecture: Using search engine tools (query generation)

The most important functions when working with databases are the information filter, search and query.


Sorting is the process of ordering information according to some attribute. There are ascending and descending sorts. If the database has numeric information, then ranking is used as sorting, and if it is textual information, then alphabetical sorting is used.


When considering sorting by database management systems in MS Access, sorting is carried out within a single field. If a new sort is used, then its previous results are lost. If the sort is nested, then it can be done using a query.


Filter is the selection of information required by the user. You can set complex selection conditions.

To find some data, you can use special characters called mask:


* – means that any number of characters can be selected.

? - instead of this character, there can be any character that is allowed to be used.

# - any number can be used instead of this character.

- any characters in brackets.

- – any character corresponding to the selected range.

! – any character, except those enclosed in square brackets.

This filter by signs can be used for the entire table, or for a part of the data already selected according to some criteria.

Request- this function is large-scale and can include both sorting and filtering. This feature allows you to select data in multiple fields and tables. You can create a query library that is saved for future use. There is a special SQL query language - structured.


After studying this topic, you will learn and repeat:

What are search servers for?
- appointment of the main parts of search servers;
- what types of information search exist on the Internet;
- the basic rules for generating a request in the Yandex search engine.

Search by URLs

The fastest and most reliable way to find information on the Internet is to search for URLs. Many of them are given in printed publications, special reference books, are heard on the air of popular radio stations and from TV screens.

♦ Zenit football club fans know the address www.fc-zenit.ru by heart.
♦ Fans of the group "Korol i Shut" are well aware of the official site of this group www.korol.spb.ru.
♦ Fans of the NTV channel can easily find its website at www.ntv.ru. For quick access to the above resources, just launch a browser program, such as Internet Explorer, and type a familiar URL in the address bar.

Search engines

There is a huge amount of documents on the Internet. To facilitate the search for the necessary information, special search engines are created.

search engines- These are automatic systems that poll servers connected to the global network and store information about the data available on the servers in their database. According to a specially formulated request, search engines provide information about where you can get the necessary data.

As a rule, search engines consist of three parts: robot, index and request processing program.

Robot (Spider, Robot or Bot) is a program that visits web pages and reads (in whole or in part) their content. Search engine robots differ in their individual scheme for analyzing the content of a web page.
Search engine index- this is a repository of search images visited by robots pages. The search image of a document (including a web page) is a description of the content of the document in a special information retrieval language. This description contains codes for the keywords of the document, reflecting its meaning and content. The indexes in each search engine differ in the amount and way in which information is stored. The databases of the leading search engines store information about tens of millions of documents, and their index volumes are hundreds of gigabytes. Indexes are periodically updated and supplemented, so the results of the work of one search engine with the same query may differ if the search was performed at different times.

Request handler is a program that, in accordance with the user's request, "looks" the index for the presence of the necessary information and returns links to the documents found. The set of links at the output of the system is distributed by the program in descending order of relevance y, that is, from the highest degree of matching the link to the request to the lowest.

Currently, the most popular for Russian Internet users are three major index-type search engines:

These systems take into account the grammatical features of the Russian language, so the results of their search in Russian-language resources are of higher quality than in Western systems.

Search engines differ in the coverage of information resources:

♦ general search engines have a database in all areas of knowledge and are distinguished by an extensive index and a large amount of accumulated information;
♦ Special purpose search engines look only for sites with specific topics, such as music or museum sites.

The main characteristics of search engines are:

♦ volume of documents in the index;
♦ frequency of updating information;
♦ the information space that the search engine robot covers and the variety of types of documents about which information is collected;
♦ request processing speed;
♦ criterion for determining relevance (correspondence of the found document to the search query);
♦ the possibility of detailing and clarifying the request.

Search by rubricator of the search engine

Search directories are a systematic collection (selection) of links to other Internet resources. Links are organized in the form of a thematic rubricator, which is a hierarchical structure, moving along which you can find the information you need.

Let us give as an example the structure of the Yandex search Internet catalog. This is a general purpose directory, as it contains links to Internet resources in almost all possible directions. This catalog contains the following topics:

♦ Business and economics;
♦ References and links;
♦ Society and politics;
♦ Home and family;
♦ Science and education;
♦ Entertainment and recreation;
♦ Computers and communications;
♦ Culture and art.

Each topic includes many subsections, and these in turn contain rubrics, and so on.

Suppose you are preparing an event for Victory Day and want to search the Internet for the words of Bulat Okudzhava's famous military song "You hear the boots rumble." The search can be organized as follows: Yandex Catalog Culture and art Music Author's song.

This search method is quite fast and efficient. At the end, you are offered only 5 links, among which there are links to sites with songs of famous bards. It remains only to find on the site an archive with lyrics by B. Okudzhava and select the desired text in it.

Another example. Suppose you are going to buy a mobile phone and want to compare the characteristics of devices from different companies. The search could be conducted under the following catalog headings: Yandex Catalog Computers and communications Mobile communications Mobile phones.

Having received a limited number of references, you can quickly view them and choose a phone by examining the characteristics by firms and modifications of the devices.

Keyword Search

Most search engines have the ability to search by keywords. This is one of the most common search types. To search by keywords, you need to enter a word or several words to be searched in a special window and click on the Search button. The search engine will find in its database and show documents containing these words. There may be many such documents, but a lot in this case does not necessarily mean good.

Let's conduct some experiments with any of the search engines. Suppose we decide to start an aquarium and we are interested in any information on this topic.

At first glance, the simplest thing is to search for the word "aquarium". Let's check this, for example, in the Yandex search engine. The search result will be over 460,000 pages on 3,500 sites - a huge number of links. Moreover, if you look more closely, among them will be sites that mention B. Grebenshchikov's Aquarium group, shopping centers and informal associations with the same name, and much more that has nothing to do with aquarium fish.

It is easy to guess that such a search cannot satisfy even the unpretentious user. Too much time will have to be spent on selecting among all the proposed documents those that relate to the subject we need, and even more so on getting to know their contents.

You can immediately conclude that it is usually not advisable to search by one word, because it is very difficult to determine the topic of a document, web page or site by one word. The exception is rare words and terms that are almost never used outside their subject area.

Let's try to clarify the search conditions and enter the phrase "aquarium fish". The search result will be a little over 20,000 pages and about 650 sites. As you can see, the number of links has decreased by more than 20 times. This result suits us more, but all the same, among the proposed links you can find, for example, Russian souvenir sets of match labels with images of fish, and collections of computer desktop screensavers, and catalogs of aquarium fish with photos, and aquarium accessories stores.

It is obvious that we should continue moving in the direction of refining the search conditions.

In order to make the search more productive, all search engines have a special query language with its own syntax. These languages ​​are similar in many ways. Learning all of them is quite difficult, but any search engine has a help system that will allow you to master the desired language.

Here are ten simple rules for generating a request in the Yandex search engine.

1. Keywords in the query should be written in lowercase (small) letters. This will ensure that all keywords are searched, not just those that start with an uppercase letter.

2. The search takes into account all forms of the word according to the rules of the Russian language, regardless of the form of the word in the query. For example, if the word “know” was specified in the query, then the words “know”, “know”, etc. will also satisfy the search condition.

3. To search for a stable phrase, you should enclose the words in quotation marks, for example, “porcelain dishes”.

4. To search by the exact word form, you must put an exclamation mark before the word. For example, to search for the word "September" in the genitive case, you would write "!September".

5. To search within a single sentence, the words in the query are separated by a space or sign &: "adventure novel" or "adventure & novel". Several words typed in the query, separated by spaces, mean that they must all be included in one sentence of the document being searched for.

6. If you want to select only those documents in which every word specified in the query occurs, put a plus sign "+" in front of each of them. If you, on the contrary, want to exclude any words from the search result, put a minus "-" in front of this word. Signs "+" and "-" must be written with a space from the previous one and merged with the next word. For example, the query "Volga-car" will find documents that contain the word "Volga" and not the word "car".

7. When searching for synonyms or words that are close in meaning, you can put a vertical bar "|" between words. For example, for the query "child | kid | baby" will find documents with any of these words.

8. Instead of a single word in a query, you can substitute an entire expression. To do this, it must be taken in brackets, for example "(child | baby | children | baby) + (care | upbringing)".

9. The *~" (tilde) character allows you to find documents with a sentence containing the first word, but not containing the second. For example, the search "books ~ shop" will find all documents containing the word "books" that does not have the word "shop" next to it (within the sentence).

10. If the operator is repeated once (for example, & or ~), the search is performed within the sentence. The double operator (&&, -) searches within the document. For example, the search "cancer - astrology" will find documents with the word "cancer" that are not related to astrology.

Having a certain set of the most common terms in the desired area, you can use advanced search. On fig. 3.3 shows the advanced search window in the Yandex search engine. In this mode, the capabilities of the query language are implemented as a form. A similar service, including dictionary filters, is offered by almost all search engines.

Rice. 3.3. An example of an advanced search in the Yandex system

Given the right choice of desirable and required words and the exclusion of undesirable terms, such a search can give good results.

Let's go back to the aquarium fish example. After reading several documents offered by the search engine, it becomes clear that the search for information on the Internet should not begin with the choice of aquarium fish. An aquarium is a complex biological system, the creation and maintenance of which requires special knowledge, time and serious investments.

Based on the information received, a person performing a search on the Internet can radically change the strategy of further search, deciding to study the special literature related to the issue under study.

To search for literature or full-text documents, the following query is possible:

"+ (aquarium | aquarist | aquarist) + for beginners + (advice | literature) + (article | thesis | full-text) - (price | shop | delivery | catalog)".

After processing the request by the search engine, the following result was obtained: pages - 195, sites - at least 43.

As can be seen from the search statistics, the result was very successful. Already the first links lead to the required documents:

Aquarium setup > Tips for the beginner aquarist >
Articles > Aq uascope. en
http://aquascope.ru/modules/wfsection/article.php?page=l&articleid=49 (32KB) - strict compliance.
ADVICE TO BEGINNER AQUARIUMISTS. How to choose and install an aquarium, how to...
http://www.aquariums.ru/sovna.htm (2KB) 07/23/2002 - non-strict correspondence.

Now you can summarize the results of the search, draw certain conclusions and decide on possible actions:

♦ Stop further search, because for various reasons, the maintenance of the aquarium is beyond your power.
♦ Read the suggested articles and start setting up an aquarium.
♦ Search for materials about hamsters or budgies.

Professional Search

Researchers and specialists will have to take a more thoughtful approach to the organization of the search. When professionally searching for information on the Internet, the following requirements must be met:

♦ high search speed;
♦ reliability of the received information;
♦ completeness of coverage of resources during the search.

Speed. The speed of the search depends mainly on two factors: on competent search planning (selection of search services and tools) and skills in working with an already selected resource (the ability to quickly understand its structure and navigation methods). Search indexes are not enough to ensure search speed. In addition to them, there are a number of search resources on the Internet, the use of which ensures the performance of a professional search.

Reliability. The question of the reliability of information received from the Internet is very relevant, since anyone can place any information there without any control over its compliance with reality. This, in turn, leads to a large number of unreliable sources, such as abstracts and term papers that have flooded the Internet.

There are special search services that allow you to evaluate the reliability of a source of information on the Internet.

Completeness. A necessary condition for a successful full-scale collection of information is knowledge of the main types of resources that exist today and the use of various search services. No search engine can cover all the resources of the Internet.

As a rule, to achieve a positive result, the user must resort to the services of several search engines. You can do it yourself, moving from system to system, or you can entrust this work to one of the metasearch engines (meta is the first component of compound words denoting systems for describing and researching other systems).

Rice. 3.4. Metasearch windows

Metasearch engines do not have their own search databases and use the resources of many other search engines when searching. Due to this, the probability of finding the necessary information is very high. Work in metasearch systems is carried out according to the same rules as work in search engines. This is due to the fact that metasearch engines are a kind of add-on to search engines and use their index databases in their work. The appearance of metasearch engines resembles the appearance of well-known search engines. On fig. 3.4 shows the windows of the metasearch engines myweb.ru and metabot.ru.

Experience shows that in most cases a better result is achieved by using several independent search indexes than by using one metasearch engine.

Control questions and tasks

1. What is the purpose of a browser program?

2. What browser programs do you know?

3. Where can a user planning an Internet search find URLs?

4. What is the search technology for the rubricator of the search engine?

5. What is the keyword search technology?

6. What requirements must be observed in the professional search for information on the Internet?

7. When do you need to specify the signs "+" or "-" in the search criteria?

8. What search criteria in Yandex are set by the following phrase:

(nanny | teacher | governess) ++ (care | upbringing | supervision).

9. What does doubling the sign (∼∼ or ++) mean when forming a complex query?

10. What is search relevancy?

11. What is the purpose of metasearch engines?

Classification of search tools (their types and types) Search tools can be divided into: - directories or directories - directories - search engines - search engines. This classification is based on the principle of selecting and processing information for the search tool database, namely: how automated this process is, who creates the search tool database: people or computers.


Classification of search tools (their types and types) Recently, the difference between search engines and directories has been “erased”, as their creators are trying not to focus users on the principles of information selection, but present as many similar service opportunities as possible, working as universal portals. But the difference in the selection and processing of information still remains essential and decisive: - search engines use robots to search, index information - i.e. The process is fully automated; - directories are built on the fact that the site is "waiting" to be accepted, processed and described by a certain cataloging specialist.


Classification of search tools (their types and types) Such different principles of operation of search engines and directories significantly affect their volume and content, and, accordingly, the search strategy: its full text. Directories represent this or that site in general terms - the cataloger annotates and systematizes the site in accordance with its general content and the full texts are not indexed. Search engines index a large number of sites, because the robots that "browse" the contents of the Web operate on the "snowball" principle, traveling from link to link. Directories, on the other hand, are distinguished by the meaningfulness and orderliness of the selection of sites in their databases (usually these are information-rich and / or sites of large physical objects).


Classification of search tools (their types and types) So, some search engines: Altavista (Yandex (Google (Rambler (FastSearch (


Classification of search tools (their types and types) And directories (or directories): Yahoo (Librarians Index to the Internet (lii.org) List.ru (


Classification of search tools (their types and types) Among the listed, as you noticed, there are both universal global search tools and universal regional ones (in this case, Russian ones). Here are some others: EuroFerret (Voila (Altavista France (fr.altavista.com) UKPlus (ukplus.co.uk)


Classification of search tools (their types and types) The division of search tools into global and regional ones is nothing more than a classification according to the geographical principle of selecting resources for indexing. Other search tools also restrict their databases in terms of content, but by topic: FindLaw (Whowhere (MusicSearch (HumorSearch (FindBook (


Classification of search engines (their types and kinds) Note that such specialized or thematic search engines can use: either their own databases to search for your query, or they search the entire Web using other search engines.


Classification of search tools (their types and types) In addition, there are metasearch engines that offer to search in several search engines at the same time. For example: Mamma (The advantage is that in this case the maximum number of results is returned, the disadvantage is that not all search engines have the same query language syntax (for example, quotes are not supported by every search engine).


The main elements of the search engine query language Let's list the common elements and features of the search engine query language: Most use: + or - (include or exclude the term from the search prescription); "quotation marks" (to designate a phrase, a stable phrase). Some use the unions AND, AND NOT instead of + or - signs.


Basic elements of search engine query language Most allow word truncation characters on the left * (for example: wish* - will be found: wish, wishes, wishful, wishbone, and wishy-washy) Some search engines are sensitive to uppercase and lowercase letters (for example, if you when searching for materials about people with the surname Stone using Altavista, be sure to use a capital letter, because otherwise all sites where the word stone occurs will be found).


The main elements of the search engine query language (using Yandex as an example) ElementWhat it means Query example space or & or + logical AND (within a sentence) family law && logical AND (within a document) recipes && (melted cheese) | logical ORphoto | photography | snapshot | photographic image


The main elements of the search engine query language (using Yandex as an example) () word grouping (technology | manufacturing) (butter | cottage cheese) ~ or - union AND NOT (within the sentence) thought ~ law ~ ~ union AND NOT (within the document) guide in paris ~~ (agency | tour) "" search for the phrase "poetry anthology"


The main elements of the search engine query language (on the example of Yandex) ! before the word, the exact form of the word! real! moment - This law enters into force two weeks after its publication. $title("") The expression is present in the Title field of the HTML document. $title ("flora and fauna") - pages with titles similar to "Flora and fauna of Siberia"


And, in conclusion: 1) In order to correctly build a search strategy, you must first decide for yourself what kind of information you want to receive - some general information that describes an object or phenomenon as a whole, or some details, particulars that can appear in the full text of a document. In the first case, it is advisable to use a directory, in the second - a search engine. 2) If, when starting a search, you still have no idea exactly what and how much you want to get in the query results, it is recommended to use different search tools in combination.

Search tools

Search tools are special software, the main purpose of which is to provide the most optimal and high-quality information search for Internet users. Search tools are hosted on special web servers, each of which performs a specific function:

1. Analysis of web pages and entering the results of the analysis on one or another level of the database of the search server.

2. Search for information at the request of the user.

3. Providing a user-friendly interface for searching for information and viewing the search result by the user.

The methods of work used when working with these or other search tools are almost the same. Before proceeding to discuss them, consider the following concepts:

1. The search tool interface is presented as a page with hyperlinks, a query string (search string) and query activation tools.

2. Search engine index is an information base containing the result of the analysis of web pages, compiled according to certain rules.

3. Query is a keyword or phrase that the user enters into the search bar. Special characters ("", ~), mathematical symbols (*, +, ?) are used to form various queries.

The information search scheme is simple. The user types a key phrase and activates the search, thereby receiving a selection of documents according to the formulated (given) request. This list of documents is ranked according to certain criteria so that at the top of the list are those documents that most closely match the user's query. Each of the search tools uses different criteria for ranking documents, both in the analysis of search results and in the formation of the index (filling the index database of web pages).

Thus, if you specify a query of the same design in the search string for each search tool, you can get different search results. For the user, it is of great importance which documents will appear in the first two or three dozen documents according to the search results and to what extent these documents correspond to the user's expectations.

Most search tools offer two ways to search − simple search(simple search) and advanced search(advanced search) with and without a special request form. Let's consider both types of search on the example of an English-language search engine.

For example, AltaVista is useful for arbitrary queries, "Something about online degrees in information technology", while the Yahoo search tool allows you to get world news, exchange rate information or weather forecast.

Mastering the criteria for refining the query and advanced search techniques allows you to increase the efficiency of the search and quickly find the necessary information. First of all, you can increase search efficiency by using logical operators (operations) Or, And, Near, Not, mathematical and special symbols in queries. With the help of operators and / or symbols, the user links the keywords in the desired sequence in order to get the most appropriate search result for the query. Request forms are shown in Table 1.

Table 1

A simple query gives a number of links to documents, because the list includes documents containing one of the words entered at the request, or a simple phrase (see table 1). The and operator allows you to specify that all keywords should be included in the document content. However, the number of documents may still be large and it may take a long time to review them. Therefore, in some cases it is much more convenient to use the near context operator, which indicates that the words should be located in sufficient proximity in the document. Using near greatly reduces the number of documents found. The presence of the symbol "*" in the query string means that the word will be searched for by its mask. For example, let's get a list of documents containing words starting with "gov" if we write "gov*" in the query string. These can be the words government, governor, etc.

The most developed search service for Russian-language information is provided by the Yandex search server. In Yandex, you can simply write in Russian a phrase describing what you want to find, and the system will analyze and process your request, and then try to find everything that relates to a given topic. You can, using special operators, compose a string that explains to the search engine what your requirements should be for the information you are interested in.

The equally popular search engine Rambler maintains statistics on link traffic from its own database, the same logical operators AND, OR, NOT, the metacharacter * (similar to the character * in AltaVista that expands the query range), coefficient symbols + and - are supported to increase or decrease significance words entered in the query.

Let's look at the most popular information search technologies on the Internet.

Lecture 4. Information retrieval tools

The constant updating of the information array, combined with the increase in the volume of data, makes it extremely difficult to take into account the available documents and, accordingly, the search, which can be conditionally divided into:

  • factual search:in encyclopedias, reference books, dictionaries,
  • bibliographic search:libraries, directories, programs.
  • document search:electronic documents, electronic libraries, electronic journals.

The importance of the problem of information retrieval has led to the formation of an entire industry, the task of which is precisely to assist the user in navigating in cyberspace. Make up this industry specialsearch services or services. They are traditionally divided into:

  • directories or catalogs
  • search engines

These varieties are visually very similar, because"each directory has its own search engine, and each search engine has its own directory". However, the principles of their work are based on completely different approaches and technologies. In addition, each type of search services is used in solving a certain type of problem. Information retrieval involves the use of certain strategies, methods, mechanisms and means. The behavior of the user who manages the search process is determined not only by the information need, but also by the instrumental diversity of the system - the technologies and tools provided by the system. The choice of instrument largely determinessearch strategy and search technologies.

Search technologies- unified (optimized within a specific information retrieval system) sequences for the effective use of individual search tools in the process of user interaction with the system.

Search technologies usedinformation systems can be divided into 3 categories:

  • thematic catalogs and specialized catalogs (online directories);
  • search engines (full-text search);
  • meta search tools.

Thematic catalogsprovide for the processing of documents and their assignment to one of several categories, the list of which is predetermined. In fact, this is familiar to all librarians.classification-based indexing. Specialized catalogs or guides are created on separate branches and themes.search engines(the most advanced search facility on the Internet) implement full-text search technology. Texts located on the requested servers are indexed.When using funds metasearch The request is carried out simultaneously by several search engines. The search result is combined into a general, sorted by relevance, list.

Search tools - an interdependent complex of information retrieval languages ​​and data definition/management languages ​​that provides structural and semantic transformations of processing objects (documents, dictionaries, collections of search results).

1. Reference books

Search tools of the first group are electronic directories that have a clear hierarchical systematic or logical-thematic structure,much like the structure of a systematic library catalog. Working with directories allows you to navigate the Internet resources within individual branches of knowledge, delving from the general to the particular, changing hierarchical branches, going back a few steps, etc.

Among the Russian developments in this area are:

  • Aport (address: www.aport.ru),
  • List.ru (address: list.mail.ru ),
  • Weblist (address: www.weblist.ru),
  • Ivan Susanin (address: www.susanin.net
  • Snail (address: www.ulitka.ru).

The main distinguishing feature of reference books is that they are made by hand. The editorial boards of each of the reference books thatby the nature of their work, they resemble the cataloging and systematization departments of large libraries, regularly review the contents of newly appeared servers and track changes on existing ones. The revealed data are analyzed and entered into the sections of the reference book in accordance with the accepted classification. The description of the server as a whole (or section, if it appears to be a completely independent block) is supplied with a brief annotation containing general information about the nature of the information available. In some cases, additional information is entered about the language of documents, attendance of the resource, its physical location, etc.

The main parameters characterizing the advantages of directories are:

  • volume;
  • the efficiency of reflecting new or changed resources;
  • consistency and consistency of the hierarchical classification scheme;
  • structure crossover.

The volume of the directory determines the degree of its reliability or "information strength". In some systems, there is a special mechanism that periodically checks the availability of the site and excludes it from the list in case of a long "absence" on the Web. The logic (scientific) of the applied classification scheme determines the degree of ease with which users find the required information. Systemcross referencesallows you to identify information using different approaches (for example, territorial or sectoral). In this case, the classification scheme should automatically lead the user to the desired object, no matter which search path is chosen.

The possibility of compiling a query for this type of search tools does not play a special role. Complicated searches that require detailing the request are not carried out using catalogs.

Directories are designed to solvethree types of tasks:

  • orientation in an unfamiliar branch of knowledge;
  • search for large objects, which are, for example, servers of organizations or significant projects;
  • obtaining a ready-made list of resources that have a blurry search image (libraries of a certain type, transport schedules or websites of political parties, etc.)

Another example is comparing a resource directory with a library's systematic catalogue, where a book (in this case, an entire site) is left with only a description and annotation.

2. Search engines

The work of search engines (search engines) is based on completely different technological principles. The task of search engines is to provide a detailed search for information, which can only be achieved through accounting ( indexing ) content of as many web pages as possible. Unlike directories, search engines operate in an automated mode and have a uniform principle of operation.

Search engines consist of two basic components. The first component is robot program , whose task is to move from server to server and find new (or changed) documents there, downloading them to the main computer of the system. The robot looks through the content of the document, finds new links, both to other documents on the server and to external sites. Then the program independently goes to the specified links, finds new documents, after which the process is repeated again,recalling the well-known "snowball method" in the bibliography. The identified documents are processed (indexed) by the second component of the search engine. In this case, as a rule, all the content of the page is taken into account, including text, illustrations, audio and video files. All words in the document are indexed, which makes it possible to use search engines for detailed searches on the narrowest topics. Formed index files , storing information about which keyword, how many times, in which document and on which server it is used, make up the database that the librarian accesses when entering combinations of keywords in the query string.

The output of the results is carried out using a special module that producesintelligent ranking of results . This takes into account:

  • the location of the term in the document (title, title, body text), the frequency of its repetition,
  • percentage ratio of the search term to the text of the page,
  • the number and authority of external links to this page from other sites.

To basic parameters of search engines relate:

  • number of indexed servers and individual documents (volume of index files);
  • the degree of efficiency of updating the database by including information about new materials and deleting obsolete ones;
  • possibilities for making a request;
  • intelligence of the search results ranking system;
  • the presence of additional service functions that facilitate the work of the user.

The ability of a search engine to express a query as accurately as possible largely determines the quality of the results obtained. Each machine has its own vocabulary, which allows you to detail in different wayssearch warrant.

All search engines havesearch results ranking module. It is the second basic component of all systems. The list of factors taken into account when determining the place of a document in the list of links is unusually wide: from the location of the word on the page to the rating (authority) of pages that link to the found document.

  • Google (address: www.google.com ),
  • AlltheWeb (address: www.alltheweb.com ),
  • Alta Vista (address: www.altavista.com).

Similar search tools exist in Russia. All of them are designed to work with Russian-language documents and have a powerful http://www.metabot.ru).

Conclusions on the topic of the lecture block

The search engine makes a selection of pages from the database in accordance with the query, then the pages are ordered by the degree of decreasing matches (note A.A.)

In this case, there is a direct analogy with the principles of operation of the distributed summary catalogs of the library. Key Opportunity meta search is the ability to send user requests simultaneously to various search engines - with subsequent summation of the results. (note by A.A.)

Turning to directories, the librarian can expect to receive only very general information on the subject, and never - detailed data: from the server of a large corporation containing thousands of pages, the directory will present only the name and a few lines of annotation.

Loading...
Top