Modern methods of information search. As a result, a list of information resources is formed. This step allows you to create a personal information system aimed at solving a specific search problem. The essence of this method is to apply

1. Introduction

Every year the volume of the Internet increases many times, so the probability of finding the necessary information increases dramatically. The Internet unites millions of computers, many different networks, the number of users is increasing by 15-80% annually. And, nevertheless, more and more often when accessing the Internet, the main problem is not the lack of the required information, but the ability to find it. As a rule, an ordinary person, due to various circumstances, cannot or does not want to spend more than 15-20 minutes searching for the answer he needs. Therefore, it is especially important to correctly and competently learn, it would seem, a simple thing - where and how to look in order to get the DESIRED answers.

To find the information you need, you need to find its address. For this, there are specialized search servers (index robots (search engines), thematic Internet directories, meta-search systems, people search services, etc.). This master class reveals the main technologies for searching information on the Internet, provides common features of search tools, examines the structure of search queries for the most popular Russian-language and English-language search engines.

2. Search technologies

Web-technology World Wide Web (WWW) is considered a special technology for the preparation and placement of documents on the Internet. The WWW includes web pages, electronic libraries, catalogs, and even virtual museums! With such an abundance of information, the question arises sharply: “How to navigate in such a huge and large-scale information space?”
Search tools come to the rescue in solving this problem.

2.1 Search tools

Search tools are special software, the main purpose of which is to provide the most optimal and high-quality information search for Internet users. Search tools are hosted on special web servers, each of which performs a specific function:

  1. Analysis of web pages and entering the results of the analysis to one or another level of the search server database.
  2. Search for information at the user's request.
  3. Providing a user-friendly interface for searching information and viewing the search result by the user.

The methods of work used when working with these or other search tools are almost the same. Before proceeding to discuss them, consider the following concepts:

  1. The search tool interface is presented as a page with hyperlinks, a query string (search string) and query activation tools.
  2. The search engine index is an information base containing the result of the analysis of web pages, compiled according to certain rules.
  3. A query is a keyword or phrase that the user enters into the search bar. Special characters ("", ~), mathematical symbols (*, +, ?) are used to form various queries.

The scheme for searching for information on the Internet is simple. The user types a key phrase and activates the search, thereby receiving a selection of documents according to the formulated (given) request. This list of documents is ranked according to certain criteria so that at the top of the list are those documents that most closely match the user's query. Each of the search tools uses different criteria for ranking documents, both in the analysis of search results and in the formation of the index (filling the index database of web pages).

Thus, if you specify a query of the same design in the search string for each search tool, you can get different search results. For the user, it is of great importance which documents will appear in the first two or three dozen documents according to the search results and to what extent these documents correspond to the user's expectations.

Most search tools offer two ways to search − simple search(simple search) and advanced search(advanced search) with and without a special request form. Let's consider both types of search on the example of an English-language search engine.

For example, AltaVista is useful for arbitrary queries, "Something about online degrees in information technology", while the Yahoo search tool allows you to get world news, exchange rate information or weather forecast.

Mastering the criteria for refining a query and advanced search techniques allows you to increase search efficiency and quickly find the necessary information. First of all, you can increase search efficiency by using logical operators (operations) Or, And, Near, Not, mathematical and special symbols in queries. With the help of operators and / or symbols, the user links the keywords in the desired sequence in order to get the most appropriate search result for the query. Request forms are shown in Table 1.

Table 1

A simple query gives a number of links to documents, because the list includes documents containing one of the words entered at the request, or a simple phrase (see table 1). The and operator allows you to specify that all keywords should be included in the document content. However, the number of documents may still be large and it may take a long time to review them. Therefore, in some cases it is much more convenient to use the near context operator, which indicates that the words should be located in the document in sufficient proximity. Using near greatly reduces the number of documents found. The presence of the symbol "*" in the query string means that the word will be searched for by its mask. For example, let's get a list of documents containing words starting with "gov" if we write "gov*" in the query string. These can be the words government, governor, etc.

The equally popular search engine Rambler maintains statistics on link traffic from its own database, the same logical operators AND, OR, NOT, the metacharacter * (similar to the character * in AltaVista that expands the query range), coefficient symbols + and - are supported to increase or decrease significance words entered in the query.

Let's look at the most popular information search technologies on the Internet.

2.2 Search engines

Web search engines are servers with a huge database of URLs that automatically access WWW pages at all these addresses, examine the contents of these pages, form and write keywords from the pages into their database (index pages).

Moreover, search engine robots follow the links they encounter on the pages and re-index them. Since almost any WWW page has many links to other pages, with such work, the search engine in the end result can theoretically bypass all sites on the Internet.

It is this type of search tools that is the most famous and popular among all Internet users. Everyone knows the names of well-known web search engines (search engines) - Yandex, Rambler, Aport.

To use this type of search tool, you need to go to it and type in the search bar the keyword you are interested in. Next, you will receive an output from the links stored in the search engine database that are closest to your query. To make the search most effective, pay attention to the following points in advance:

  • decide on the topic of the request. What exactly do you end up looking for?
  • pay attention to language, grammar, use of various non-alphabetic characters, morphology. It is also important to correctly formulate and enter key words. Each search engine has its own form of making a request - the principle is the same, but the symbols or operators used may differ. The required request forms also vary depending on the complexity of the search engine software and the services they provide. One way or another, each search engine has a section "Help" ("Help"), where all the syntax rules, as well as recommendations and tips for searching, are explained in an accessible way (screenshot of search engine pages).
  • use the capabilities of different search engines. If you can't find it on Yandex, try Google. Use advanced search services.
  • to exclude documents containing certain terms, use the "-" sign before each such word. For example, if you want information about the works of Shakespeare, with the exception of "Hamlet", then enter the query in the form: "Shakespeare-Hamlet". And in order to ensure that certain links are necessarily included in the search results, use the "+" symbol. So, to find links about the sale of cars specifically, you need the query "sale + car". To increase the efficiency and accuracy of your search, use combinations of these symbols.
  • each link in the list of search results contains - several lines from the found document, among which there are your keywords. Before clicking on the link, evaluate the relevance of the snippet to the topic of the request. After clicking on the link to a specific site, carefully look around the main page. As a rule, the first page is enough to understand whether you have come to the address or not. If yes, then conduct further searches for the necessary information on the selected site (in sections of the site), if not, return to the search results and try the next link.
  • remember that search engines do not produce independent information (except for explanations about themselves). The search engine is only an intermediary between the owner of the information (website) and you. The databases are constantly updated, new addresses are entered into them, but the backlog from the information that really exists in the world still remains. Simply because search engines don't operate at the speed of light.

The most famous web search engines include Google, Yahoo, Alta Vista, Excite, Hot Bot, Lycos. Among the Russian speakers, one can single out Yandex, Rambler, Aport.

Search engines are the largest and most valuable, but far from the only sources of information on the Web, because there are other ways to search the Internet besides them.

2.3 Directories

The catalog of Internet resources is a constantly updated and replenished hierarchical catalog containing many categories and individual web servers with a brief description of their contents. The catalog search method implies “moving down the steps”, that is, moving from more general categories to more specific ones. One of the advantages of thematic directories is that the explanations to the links are given by the creators of the directory and fully reflect its content, that is, it gives you the opportunity to more accurately determine how the content of the server corresponds to the purpose of your search.

An example of a thematic Russian-language catalog is the resource http://www.ulitka.ru/.

On the main page of this site there is a thematic rubricator,

with the help of which the user enters the rubric with links to the products of interest to him.

In addition, some subject directories allow you to search by keywords. The user enters the desired keyword in the search bar

and receives a list of links with descriptions of sites that most closely match his request. It should be noted that this search does not take place in the content of WWW-servers, but in their brief description stored in the directory.

In our example, the directory also has the ability to sort sites by the number of visits, alphabetically, by date of entry.

Other examples of Russian-language directories:
[email protected]
Weblist
Vsego.ru
Among the English-language directories are:
http://www.DMOS.org
http://www.yahoo.com/
http://www.looksmart.com

2.4 Collections of links

Link collections are links sorted by topic. They are quite different from each other in terms of content, so in order to find a selection that best suits your interests, you need to walk through them on your own in order to form your own opinion.

As an example, we will give a selection of links "Treasures of the Internet" JSC "Relcom"

The user, by clicking on any of the sections of interest to him

  • CONTENT

    For motorists

    • Astronomy and astrology
    • Your house
    • Your pets
    • Children are the flowers of life
    • Leisure
    • Cities on the Internet
    • Health and medicine
    • Information agencies and services
    • Museum of local lore, etc.,
    • Automotive electronics.
    • Antique Automotive Museum.
    • Board of Legal Protection of Car Owners.
    • sportdrive.

    The advantage of this type of search tools is their focus, usually the selection includes rare Internet resources, selected by a specific webmaster or owner of a web page.

    2.5 Address databases

    Address databases are special search servers that usually use classifications by type of activity, by products and services provided, and by geography. Sometimes they are supplemented by an alphabetical search. The database records store information about sites that provide information about the e-mail address, organization and postal address for a fee.

    The largest English-language address database can be called: http://www.lookup.com/ -

    Getting into these subdirectories, the user finds links to sites that offer information of interest to him.

    Widely available and official databases of addresses in the Russian Federation are unknown to us.

    2.6 Searching Gopher archives

    Gopher is an interconnected system of servers (Gopher space) distributed over the Internet.

    The richest literary library is collected in the Gopher space, but the materials are not available for remote viewing: the user can only view the hierarchically organized table of contents and select a file by title. With the help of a special program (Veronica), such a search can be done automatically using queries based on keywords.

    Until 1995, Gopher was the most dynamic Internet technology: the growth rate of the number of related servers outpaced the growth rate of servers of all other types of Internet. In the EUnet/Relcom network, Gopher servers have not received active development, and today almost no one remembers them.

    2.7 FTP File Search System (FTP Search)

    An FTP file search engine is a special type of Internet search engine that allows you to find files available on "anonymous" FTP servers. The FTP protocol is designed to transfer files over a network, and in this sense, it is functionally a kind of analogue of Gopher.

    The main search criterion is the file name specified in various ways (exact match, substring, regular expression, etc.). This type of search, of course, cannot compete with search engines in terms of capabilities, since the contents of files are not taken into account in any way during the search, and files, as you know, can be given arbitrary names. However, if you need to find some well-known program or standard description, then with a high degree of probability the file containing it will have the appropriate name, and you can find it using one of the FTP Search servers:

    FileSearch searches for files on FTP servers by the names of the files themselves and directories. If you are looking for a program or something else, then on WWW-servers you will most likely find their description, and from FTP-servers you can download them to yourself.

    2.8 Search engine in Usenet News conferences

    USENET NEWS is the Internet community teleconferencing system. In the West, this service is called news. A close analogue of teleconferencing are the so-called "echoes" in the FIDO network.

    From a teleconference subscriber's point of view, USENET is a bulletin board that has sections where you can find articles on everything from politics to gardening. This bulletin board is accessible through a computer, similar to email. Without leaving your computer, you can read or post articles to a particular conference, find useful advice or join discussions. Naturally, articles take up space on computers, so they are not stored forever, but are periodically destroyed, making room for new ones. Worldwide, the best service for finding information in Usenet conferences is the Google Groups server (Google Inc.).

    Google Groups is a free online community and discussion group service that offers the largest archive of Usenet messages on the Internet (more than a billion messages). For more information about the terms of use of the service, please visit http://groups.google.com/intl/ru /googlegroups/tour/index.html

    Among the Russian-speakers, the USENET World System server and Relcom teleconferences stand out. Just like in other search services, the user types a query string, and the server generates a list of conferences containing keywords. Next, you need to subscribe to the selected conferences in the news program. There is also a similar Russian FidoNet Online server: Fido conferences on the WWW.

    2.9 Meta search systems

    For a quick search in the databases of several search engines at once, it is better to turn to meta-search systems.

    Meta search engines are search engines that send your query to a huge number of different search engines, then process the results, remove duplicate resource addresses, and present a wider range of what is presented on the Internet.

    The world's most popular meta-search engine is Search.com.

    Unified Search.com search engine from CNET, Inc. includes almost two dozen search engines, links to which are replete with the entire Internet.

    Using this type of search tools, the user can search for information in a variety of search engines, but the negative side of these systems can be called their instability.

    2.10 People search systems

    People search systems are special servers that allow you to search for people on the Internet, the user can specify the full name. person and get their email address and URL. However, it should be noted that people search engines generally take information about email addresses from open sources, such as Usenet forums. Among the most famous people search systems are:

    Finding e-mail addresses

    in special search fields for contact information (First Name. City, Last Name, Phone number), you can find the information you are interested in.

    People search engines are really big servers, their databases contain about 6,000,000 addresses.

    3. Conclusion

    We reviewed the main technologies for searching information on the Internet and presented in general terms the search tools that currently exist on the Internet, as well as the structure of search queries for the most popular Russian-language and English-language search engines, and, summing up the above, we want to note that a single optimal scheme searching for information on the Internet does not exist. Depending on the specific information you need, you can use the appropriate search tools and services. And the quality of search results depends on how competently the search services are selected.

  • Search technologies

    The laws of friction and heat and mass transfer in a turbulent boundary layer

    There are several versions of the representation of the ʼʼlaw of frictionʼʼ (for the reference case), leading to almost identical results. In accordance with the concept of a ʼʼlogarithmicʼʼ boundary layer (with the value of the first turbulence constant χ = 0.4) the friction law for an extremely developed turbulence with ʼʼvanishing viscosityʼʼ is well approximated by a simple Karman formula:

    With a power-law representation of the velocity profile, the formula should be proposed:

    where: ; n is the exponent of the velocity profile;

    – semi-empirical coefficient;

    BUT– empirical coefficient;

    δ is the thickness of the boundary layer.

    Using the ratios for Reynolds numbers built on different linear quantities:

    It is important to note that for the case of the development of a turbulent boundary layer from the leading edge ( x cr = 0) the law of friction must also be represented as:

    The values ​​of the parametric values ​​of the presented formulas for various velocity profiles are summarized in the table

    Parameter n
    1/7 1/8 1/9 1/10
    BUT 8,74 9,71 10,6 11,5
    0,0975 0,089 0,0818 0,0757
    1,28 1,25 1,22 1,20
    m 0,250 0,222 0,200 0,182
    B 0,0252 0,0206 0,0190 0,0148
    m 1 0,200 0,182 0,167 0,154
    B1 0,0576 0,0450 0,0362 0,0308

    Other forms of representation of the law of friction are also known and used, leading to practically the same results. So V.M. Ievlev proposed an approximation:

    Formulas for the laws of heat and mass transfer are obtained from the ʼʼlaws of frictionʼʼ for standard conditions (reference case) by means of the well-known principle of Reynolds' triple analogy.

    where: S– correction factor – Reynolds analogy factor for non-compliance with the conditions of the standard (s), factor S in the first approximation is satisfactorily approximated by the relation:

    It is important to note that for the case of using integral parameters, the ʼʼlawsʼʼ of heat and mass transfer are well described by the dependencies:

    Web-technology World Wide Web (WWW) is considered a special technology for the preparation and placement of documents on the Internet. The WWW includes web pages, electronic libraries, catalogs, and even virtual museums! With such an abundance of information, the question arises sharply: ʼʼHow to navigate in such a huge and large-scale information space -ʼʼ Search tools come to the rescue in solving this problem.

    Search tools are special software whose main goal is to provide the most optimal and high-quality information search for Internet users. Search tools are hosted on special web servers, each of which performs a specific function:

    1. Analysis of web pages and entering the results of the analysis on one or another level of the database of the search server.

    2. Search for information at the request of the user.

    3. Providing a user-friendly interface for searching for information and viewing the search result by the user.

    The methods of work used when working with these or other search tools are almost the same. Before proceeding to discuss them, consider the following concepts:

    1. The search tool interface is presented as a page with hyperlinks, a query string (search string) and query activation tools.

    2. Search engine index - ϶ᴛᴏ information base containing the result of the analysis of web pages, compiled according to certain rules.

    3. Query - ϶ᴛᴏ keyword or phrase that the user enters into the search bar. Special characters ("", ~), mathematical symbols (*, +, -) are used to form various queries.

    The information search scheme is simple. The user types a key phrase and activates the search, thereby receiving a selection of documents according to the formulated (given) request. This list of documents is ranked according to certain criteria so that at the top of the list are those documents that most closely match the user's query. Each of the search tools uses different criteria for ranking documents, both in the analysis of search results and in the formation of the index (filling the index database of web pages).

    Τᴀᴋᴎᴍ ᴏϬᴩᴀᴈᴏᴍ, if you specify the same query in the search string for each search tool, you can get different search results. For the user, it is of great importance which documents will appear in the first two or three dozen documents according to the search results and to what extent these documents correspond to the user's expectations.

    Most search tools offer two ways to search − simple search(simple search) and advanced search(advanced search) with and without a special request form. Let's consider both types of search on the example of an English-language search engine.

    For example, AltaVista is useful for arbitrary queries, ʼʼSomething about online degrees in information technologyʼʼ, while Yahoo's search tool allows you to get world news, exchange rate information, or the weather forecast.

    Mastering the criteria for refining a query and advanced search techniques allows you to increase search efficiency and quickly find the necessary information. First of all, you can increase search efficiency by using logical operators (operations) Or, And, Near, Not, mathematical and special symbols in queries. With the help of operators and / or symbols, the user links the keywords in the desired sequence in order to get the most appropriate search result for the query. A simple query returns a small number of references to documents, because the list includes documents containing one of the words entered during the query, or a simple phrase (see table 1). The and operator allows you to specify that all keywords should be included in the document content. However, the number of documents must still be large, and it will take sufficient time to review them. For this reason, in some cases it is much more convenient to use the near context operator, which indicates that the words should be located in sufficient proximity in the document. Using near greatly reduces the number of documents found. The presence of the symbol "*" in the query string means that the word will be searched for by its mask. For example, let's get a list of documents containing words starting with "gov", if we write "gov*" in the query string. These are the words government, governor, etc.

    The most developed search service for Russian-language information is provided by the Yandex search server. In Yandex, you can simply write in Russian a phrase describing what you want to find, and the system will analyze and process your request, and then try to find everything that relates to a given topic. You can, using special operators, compose a string that explains to the search engine what your requirements should be for the information you are interested in. Some of the Yandex query language operators can be found here: http://help.yandex.ru/search/ -id=481939

    The equally popular Rambler search engine keeps statistics on link traffic from its own database, the same logical operators AND, OR, NOT, the metacharacter * (similar to the character * in AltaVista that expands the query range), coefficient symbols + and -, to increase or reducing the importance of the words entered in the query.

    Let's look at the most popular information search technologies on the Internet.

    Topic 3 Working with Internet search engines


    After studying this topic, you will learn and repeat:

    What are search servers for?
    - appointment of the main parts of search servers;
    - what types of information search exist on the Internet;
    - the basic rules for generating a request in the Yandex search engine.

    Search by URLs

    The fastest and most reliable way to find information on the Internet is to search for URLs. Many of them are given in printed publications, special reference books, are heard on the air of popular radio stations and from TV screens.

    ♦ Zenit football club fans know the address www.fc-zenit.ru by heart.
    ♦ Fans of the group "Korol i Shut" are well aware of the official site of this group www.korol.spb.ru.
    ♦ Fans of the NTV channel can easily find its website at www.ntv.ru. For quick access to the above resources, just launch a browser program, such as Internet Explorer, and type a familiar URL in the address bar.

    Search engines

    There is a huge amount of documents on the Internet. To facilitate the search for the necessary information, special search engines are created.

    search engines- These are automatic systems that poll servers connected to the global network and store information about the data available on the servers in their database. According to a specially formulated request, search engines provide information about where you can get the necessary data.

    As a rule, search engines consist of three parts: robot, index and request processing program.

    Robot (Spider, Robot or Bot) is a program that visits web pages and reads (in whole or in part) their content. Search engine robots differ in their individual scheme for analyzing the content of a web page.
    Search engine index- this is a repository of search images visited by robots pages. The search image of a document (including a web page) is a description of the content of the document in a special information retrieval language. This description contains codes for the keywords of the document, reflecting its meaning and content. The indexes in each search engine differ in the amount and way in which information is stored. The databases of the leading search engines store information about tens of millions of documents, and their index volumes are hundreds of gigabytes. Indexes are periodically updated and supplemented, so the results of the work of one search engine with the same query may differ if the search was performed at different times.

    Request handler is a program that, in accordance with the user's request, "looks" the index for the presence of the necessary information and returns links to the documents found. The set of links at the output of the system is distributed by the program in descending order of relevance y, that is, from the highest degree of matching the link to the request to the lowest.

    Currently, the most popular for Russian Internet users are three major index-type search engines:

    These systems take into account the grammatical features of the Russian language, so the results of their search in Russian-language resources are of higher quality than in Western systems.

    Search engines differ in the coverage of information resources:

    ♦ general search engines have a database in all areas of knowledge and are distinguished by an extensive index and a large amount of accumulated information;
    ♦ Special purpose search engines look only for sites with specific topics, such as music or museum sites.

    The main characteristics of search engines are:

    ♦ volume of documents in the index;
    ♦ frequency of updating information;
    ♦ the information space that the search engine robot covers and the variety of types of documents about which information is collected;
    ♦ request processing speed;
    ♦ criterion for determining relevance (correspondence of the found document to the search query);
    ♦ the possibility of detailing and clarifying the request.

    Search by rubricator of the search engine

    Search directories are a systematic collection (selection) of links to other Internet resources. Links are organized in the form of a thematic rubricator, which is a hierarchical structure, moving along which you can find the information you need.

    Let us give as an example the structure of the Yandex search Internet catalog. This is a general purpose directory, as it contains links to Internet resources in almost all possible directions. This catalog contains the following topics:

    ♦ Business and economics;
    ♦ References and links;
    ♦ Society and politics;
    ♦ Home and family;
    ♦ Science and education;
    ♦ Entertainment and recreation;
    ♦ Computers and communications;
    ♦ Culture and art.

    Each topic includes many subsections, and these in turn contain rubrics, and so on.

    Suppose you are preparing an event for Victory Day and want to search the Internet for the words of Bulat Okudzhava's famous military song "You hear the boots rumble." The search can be organized as follows: Yandex Catalog Culture and art Music Author's song.

    This search method is quite fast and efficient. At the end, you are offered only 5 links, among which there are links to sites with songs of famous bards. It remains only to find on the site an archive with lyrics by B. Okudzhava and select the desired text in it.

    Another example. Suppose you are going to buy a mobile phone and want to compare the characteristics of devices from different companies. The search could be conducted under the following catalog headings: Yandex Catalog Computers and communications Mobile communications Mobile phones.

    Having received a limited number of references, you can quickly view them and choose a phone by examining the characteristics by firms and modifications of the devices.

    Keyword search

    Most search engines have the ability to search by keywords. This is one of the most common search types. To search by keywords, you need to enter a word or several words to be searched in a special window and click on the Search button. The search engine will find in its database and show documents containing these words. There may be many such documents, but a lot in this case does not necessarily mean good.

    Let's conduct some experiments with any of the search engines. Suppose we decide to start an aquarium and we are interested in any information on this topic.

    At first glance, the simplest thing is to search for the word "aquarium". Let's check this, for example, in the Yandex search engine. The search result will be over 460,000 pages on 3,500 sites - a huge number of links. Moreover, if you look more closely, among them will be sites that mention B. Grebenshchikov's Aquarium group, shopping centers and informal associations with the same name, and much more that has nothing to do with aquarium fish.

    It is easy to guess that such a search cannot satisfy even the unpretentious user. Too much time will have to be spent on selecting among all the proposed documents those that relate to the subject we need, and even more so on getting to know their contents.

    We can immediately conclude that it is usually not advisable to search by one word, because it is very difficult to determine the topic of a document, web page or site by one word. The exception is rare words and terms that are almost never used outside their subject area.

    Let's try to clarify the search conditions and enter the phrase "aquarium fish". The search result will be a little over 20,000 pages and about 650 sites. As you can see, the number of links has decreased by more than 20 times. This result suits us more, but all the same, among the proposed links you can meet, for example, Russian souvenir sets of match labels with images of fish, and collections of computer desktop screensavers, and catalogs of aquarium fish with photos, and aquarium accessories stores.

    It is obvious that we should continue moving in the direction of refining the search conditions.

    In order to make the search more productive, all search engines have a special query language with its own syntax. These languages ​​are similar in many ways. Learning all of them is quite difficult, but any search engine has a help system that will allow you to master the desired language.

    Here are ten simple rules for generating a request in the Yandex search engine.

    1. Keywords in the query should be written in lowercase (small) letters. This will ensure that all keywords are searched, not just those that start with an uppercase letter.

    2. The search takes into account all forms of the word according to the rules of the Russian language, regardless of the form of the word in the query. For example, if the word “know” was specified in the query, then the words “know”, “know”, etc. will also satisfy the search condition.

    3. To search for a stable phrase, you should enclose the words in quotation marks, for example, “porcelain dishes”.

    4. To search by the exact word form, you must put an exclamation mark before the word. For example, to search for the word "September" in the genitive case, you would write "!September".

    5. To search within a single sentence, the words in the query are separated by a space or sign &: "adventure novel" or "adventure & novel". Several words typed in the query, separated by spaces, mean that they must all be included in one sentence of the document being searched for.

    6. If you want to select only those documents in which every word specified in the query occurs, put a plus sign "+" in front of each of them. If you, on the contrary, want to exclude any words from the search result, put a minus "-" in front of this word. Signs "+" and "-" must be written with a space from the previous one and merged with the next word. For example, the query "Volga-car" will find documents that contain the word "Volga" and not the word "car".

    7. When searching for synonyms or words that are close in meaning, you can put a vertical bar "|" between words. For example, for the query "child | kid | baby" will find documents with any of these words.

    8. Instead of a single word in a query, you can substitute an entire expression. To do this, it must be taken in brackets, for example "(child | baby | children | baby) + (care | upbringing)".

    9. The *~" (tilde) character allows you to find documents with a sentence containing the first word, but not containing the second. For example, the search "books ~ shop" will find all documents containing the word "books" that does not have the word "shop" next to it (within the sentence).

    10. If the operator is repeated once (for example, & or ~), the search is performed within the sentence. The double operator (&&, -) searches within the document. For example, the query "cancer - astrology" will find documents with the word "cancer" that are not related to astrology.

    Having a certain set of the most common terms in the desired area, you can use advanced search. On fig. 3.3 shows the advanced search window in the Yandex search engine. In this mode, the capabilities of the query language are implemented as a form. A similar service, including dictionary filters, is offered by almost all search engines.

    Rice. 3.3. An example of an advanced search in the Yandex system

    Given the correct choice of desirable and required words and the exclusion of undesirable terms, such a search can give good results.

    Let's go back to the aquarium fish example. After reading several documents offered by the search engine, it becomes clear that the search for information on the Internet should not begin with the choice of aquarium fish. An aquarium is a complex biological system, the creation and maintenance of which requires special knowledge, time and serious investments.

    Based on the information received, a person performing a search on the Internet can radically change the strategy of further search, deciding to study the special literature related to the issue under study.

    To search for literature or full-text documents, the following query is possible:

    "+ (aquarium | aquarist | aquarist) + for beginners + (advice | literature) + (article | thesis | full-text) - (price | shop | delivery | catalog)".

    After processing the request by the search engine, the following result was obtained: pages - 195, sites - at least 43.

    As can be seen from the search statistics, the result was very successful. Already the first links lead to the required documents:

    Aquarium setup > Tips for the beginner aquarist >
    Articles > Aq uascope. en
    http://aquascope.ru/modules/wfsection/article.php?page=l&articleid=49 (32KB) - strict compliance.
    ADVICE TO BEGINNER AQUARIUMISTS. How to choose and install an aquarium, how to...
    http://www.aquariums.ru/sovna.htm (2KB) 07/23/2002 - non-strict correspondence.

    Now you can summarize the results of the search, draw certain conclusions and decide on possible actions:

    ♦ Stop further search, because for various reasons, the maintenance of the aquarium is beyond your power.
    ♦ Read the suggested articles and start setting up an aquarium.
    ♦ Search for materials about hamsters or budgies.

    Professional Search

    Researchers and specialists will have to take a more thoughtful approach to the organization of the search. When professionally searching for information on the Internet, the following requirements must be met:

    ♦ high search speed;
    ♦ reliability of the received information;
    ♦ completeness of coverage of resources during the search.

    Speed. The speed of the search depends mainly on two factors: on competent search planning (selection of search services and tools) and skills in working with an already selected resource (the ability to quickly understand its structure and navigation methods). Search indexes are not enough to ensure search speed. In addition to them, there are a number of search resources on the Internet, the use of which ensures the performance of a professional search.

    Reliability. The question of the reliability of information received from the Internet is very relevant, since anyone can place any information there without any control over its compliance with reality. This, in turn, leads to a large number of unreliable sources, such as abstracts and term papers that have flooded the Internet.

    There are special search services that allow you to evaluate the reliability of a source of information on the Internet.

    Completeness. A necessary condition for a successful full-scale collection of information is the knowledge of the main types of resources that exist today and the use of various search services. No search engine can cover all the resources of the Internet.

    As a rule, to achieve a positive result, the user must resort to the services of several search engines. You can do it yourself, moving from system to system, or you can entrust this work to one of the metasearch engines (meta is the first component of compound words denoting systems for describing and researching other systems).

    Rice. 3.4. Metasearch windows

    Metasearch engines do not have their own search databases and use the resources of many other search engines when searching. Due to this, the probability of finding the necessary information is very high. Work in metasearch systems is carried out according to the same rules as work in search engines. This is due to the fact that metasearch engines are a kind of add-on to search engines and use their index databases in their work. The appearance of metasearch engines resembles the appearance of well-known search engines. On fig. 3.4 shows the windows of the metasearch engines myweb.ru and metabot.ru.

    Experience shows that in most cases a better result is achieved by using several independent search indexes than by using one metasearch engine.

    Control questions and tasks

    1. What is the purpose of a browser program?

    2. What browser programs do you know?

    3. Where can a user planning an Internet search find URLs?

    4. What is the search technology for the rubricator of the search engine?

    5. What is the keyword search technology?

    6. What requirements must be observed in the professional search for information on the Internet?

    7. When do you need to specify the signs "+" or "-" in the search criteria?

    8. What search criteria in Yandex are set by the following phrase:

    (nanny | teacher | governess) ++ (care | upbringing | supervision).

    9. What does doubling the sign (∼∼ or ++) mean when forming a complex query?

    10. What is search relevancy?

    11. What is the purpose of metasearch engines?

    Lecture 4. Information retrieval tools

    The constant updating of the information array, combined with the increase in the volume of data, makes it extremely difficult to take into account the available documents and, accordingly, the search, which can be conditionally divided into:

    • factual search:in encyclopedias, reference books, dictionaries,
    • bibliographic search:libraries, directories, programs.
    • document search:electronic documents, electronic libraries, electronic journals.

    The importance of the problem of information retrieval has led to the formation of an entire industry whose task is precisely to assist the user in navigating in cyberspace. Make up this industry specialsearch services or services. They are traditionally divided into:

    • directories or catalogs
    • search engines

    These varieties are visually very similar, because"each directory has its own search engine, and each search engine has its own directory". However, the principles of their work are based on completely different approaches and technologies. In addition, each type of search services is used in solving a certain type of problem. Information retrieval involves the use of certain strategies, methods, mechanisms and means. The behavior of the user who manages the search process is determined not only by the information need, but also by the instrumental diversity of the system - the technologies and tools provided by the system. The choice of instrument largely determinessearch strategy and search technologies.

    Search technologies- unified (optimized within a specific information retrieval system) sequences for the effective use of individual search tools in the process of user interaction with the system.

    Search technologies usedinformation systems can be divided into 3 categories:

    • thematic catalogs and specialized catalogs (online directories);
    • search engines (full-text search);
    • meta search tools.

    Thematic catalogsprovide for the processing of documents and their assignment to one of several categories, the list of which is predetermined. In fact, this is familiar to all librarians.classification-based indexing. Specialized catalogs or guides are created on separate branches and themes.search engines(the most advanced search facility on the Internet) implement full-text search technology. Texts located on the requested servers are indexed.When using funds metasearch The request is carried out simultaneously by several search engines. The search result is combined into a general, sorted by relevance, list.

    Search tools - an interdependent complex of information retrieval languages ​​and data definition/management languages ​​that provides structural and semantic transformations of processing objects (documents, dictionaries, collections of search results).

    1. Reference books

    Search tools of the first group are electronic directories that have a clear hierarchical systematic or logical-thematic structure,much like the structure of a systematic library catalog. Working with directories allows you to navigate the Internet resources within individual branches of knowledge, delving from the general to the particular, changing hierarchical branches, going back a few steps, etc.

    Among the Russian developments in this area are:

    • Aport (address: www.aport.ru),
    • List.ru (address: list.mail.ru ),
    • Weblist (address: www.weblist.ru),
    • Ivan Susanin (address: www.susanin.net
    • Snail (address: www.ulitka.ru).

    The main distinguishing feature of reference books is that they are made by hand. The editorial boards of each of the reference books thatby the nature of their work, they resemble the cataloging and systematization departments of large libraries, regularly review the contents of newly appeared servers and track changes on existing ones. The revealed data are analyzed and entered into the sections of the reference book in accordance with the accepted classification. The description of the server as a whole (or section, if it appears to be a completely independent block) is supplied with a brief annotation containing general information about the nature of the information available. In some cases, additional information is entered about the language of documents, attendance of the resource, its physical location, etc.

    The main parameters characterizing the advantages of directories are:

    • volume;
    • the efficiency of reflecting new or changed resources;
    • consistency and consistency of the hierarchical classification scheme;
    • structure crossover.

    The volume of the directory determines the degree of its reliability or "information strength". In some systems, there is a special mechanism that periodically checks the availability of the site and excludes it from the list in case of a long "absence" on the Web. The logic (scientific) of the applied classification scheme determines the degree of ease with which users find the required information. Systemcross referencesallows you to identify information using different approaches (for example, territorial or sectoral). In this case, the classification scheme should automatically lead the user to the desired object, no matter which search path is chosen.

    The possibility of compiling a query for this type of search tools does not play a special role. Complicated searches that require detailing the request are not carried out using catalogs.

    Directories are designed to solvethree types of tasks:

    • orientation in an unfamiliar branch of knowledge;
    • search for large objects, which are, for example, servers of organizations or significant projects;
    • obtaining a ready-made list of resources that have a blurred search image (libraries of a certain type, transport schedules or websites of political parties, etc.)

    Another example is comparing a resource directory with a library's systematic catalogue, where a book (in this case, an entire website) is left with only a description and annotation.

    2. Search engines

    The work of search engines (search engines) is based on completely different technological principles. The task of search engines is to provide a detailed search for information, which can only be achieved through accounting ( indexing ) content of as many web pages as possible. Unlike directories, search engines operate in an automated mode and have a uniform principle of operation.

    Search engines consist of two basic components. The first component is robot program , whose task is to move from server to server and find new (or changed) documents there, downloading them to the main computer of the system. The robot scans the contents of the document, finds new links - both to other documents on the server and to external sites. Then the program independently goes to the specified links, finds new documents, after which the process is repeated again,recalling the well-known "snowball method" in the bibliography. The identified documents are processed (indexed) by the second component of the search engine. In this case, as a rule, the entire content of the page is taken into account, including text, illustrations, audio and video files. All words in the document are indexed, which makes it possible to use search engines for detailed searches on the narrowest topics. Formed index files , storing information about which keyword, how many times, in which document and on which server it is used, make up the database that the librarian accesses when entering combinations of keywords in the query string.

    The output of the results is carried out using a special module that producesintelligent ranking of results . This takes into account:

    • the location of the term in the document (title, title, body text), the frequency of its repetition,
    • percentage ratio of the search term to the text of the page,
    • the number and authority of external links to this page from other sites.

    To basic parameters of search engines relate:

    • number of indexed servers and individual documents (volume of index files);
    • the degree of efficiency of updating the database by including information about new materials and deleting obsolete ones;
    • possibilities for making a request;
    • intelligence of the search results ranking system;
    • the presence of additional service functions that facilitate the work of the user.

    The ability of a search engine to express a query as accurately as possible largely determines the quality of the results obtained. Each machine has its own vocabulary, which allows you to detail in different wayssearch order.

    All search engines havesearch results ranking module. It is the second basic component of all systems. The list of factors taken into account when determining the place of a document in the list of links is unusually wide: from the location of the word on the page to the rating (authority) of pages that link to the found document.

    • Google (address: www.google.com ),
    • AlltheWeb (address: www.alltheweb.com ),
    • Alta Vista (address: www.altavista.com).

    Similar search tools exist in Russia. All of them are designed to work with Russian-language documents and have a powerful http://www.metabot.ru).

    Conclusions on the topic of the lecture block

    The search engine makes a selection of pages from the database in accordance with the request, then the pages are ordered by the degree of decreasing matches (note A.A.)

    In this case, there is a direct analogy with the principles of operation of the distributed summary catalogs of the library. Key Opportunity meta search is the ability to send user requests simultaneously to various search engines - with subsequent summation of the results. (note by A.A.)

    Turning to directories, the librarian can expect to receive only very general information on the subject, and never - detailed data: from the server of a large corporation containing thousands of pages, the directory will present only the name and a few lines of annotation.

    Loading...
    Top