Regular languages ​​and finite state machines. Finite state machines and regular languages ​​Theory of formal languages ​​for dummies

We know the operations of combining languages. Let's define the operations of concatenation and iteration (sometimes called Kleene closure).

Let L 1 and L 2 be languages ​​in the alphabet

Then, i.e. language concatenation consists of concatenations of all words of the first language with all words of the second language. In particular, if , then , and if , then .

Let us introduce notation for the “degrees” of the language L:

Thus, L i includes all words that can be divided into i consecutive words from L .

Iteration (L) * of the language L is formed by all words that can be divided into several consecutive words from L:

It can be represented using degrees:

It is often convenient to consider a "truncated" iteration of the language, which does not contain the empty word if it does not exist in the language: . This is not a new operation, but simply a convenient shorthand for the expression .

Note also that if we consider the alphabet as a finite language consisting of single-letter words, then the previously introduced notation for the set of all words, including the empty one, in the alphabet corresponds to the definition of iteration of this language.

The following table provides a formal inductive definition regular expressions over the alphabet and the languages ​​they represent.

Expression r Language L r
L a =(a)
Let r 1 and r 2 be L r1 and L r2 -representable
regular expressions. them languages.
Then the following expressions
are regular and represent the languages:
r=(r 1 +r 2)
r=(r 1 circr 2)
r=(r 1) * L r =L r1 *

When recording regular expressions We will omit the concatenation sign and assume that the * operation has higher priority than concatenation and + , and concatenation has higher priority than + . This will allow many parentheses to be omitted. For example, can be written as 10(1 * + 0) .

Definition 5.1. Two regular expressions r and p are said to be equivalent if the languages ​​they represent are the same, i.e. L r = L p . In this case we write r = p.

It is easy to check, for example, the following properties of regular operations:

  • r + p= p+ r (commutativity of the union),
  • (r+p) +q = r + (p+q) (associativity of the union),
  • (r p) q = r (p q) (associativity of concatenation),
  • (r *) * = r * (iteration idempotency),
  • (r +p) q = rq + pq(distributivity).

Example 5.1. Let us prove as an example the not so obvious equality: (r + p) * = (r * p *) *.

Let L 1 be the language represented by its left side, and L 2 by its right. The empty word belongs to both languages. If the word is non-empty, then by the definition of iteration it can be represented as a concatenation of subwords belonging to the language. But this language is a subset of the language L"=L r * L p * (why?). Therefore . Conversely, if a word , then it is representable as a concatenation of subwords belonging to the language L ". Each of such subwords v is representable in the form v= v 1 1 ... v k 1 v 1 2 ... v l 2, where for all i=1, ... , k is a subword and for all j=1, ... , l is a subword (it is possible that k or l is equal to 0). But this means that w is a concatenation of subwords, each of which belongs to and, therefore, .

Regular language

In language theory regular set(or, in regular language) is called a formal language that satisfies the following properties. These simple properties are such that the class of regular sets is convenient to study as a whole, and the results obtained are applicable in many important cases of formal languages. That is, the concept of a regular set is an example of a mathematical structure.

Definition of a regular set

Let Σ be a finite alphabet. Regular set R(Σ) in the alphabet Σ is defined by the following recursive properties:

№. Property Description
1 The empty set is a regular set in the alphabet Σ
2 A set consisting of only one empty string is a regular set in the alphabet Σ
3 A set consisting of any one symbol of the alphabet Σ is a regular set in the alphabet Σ
4 If any two sets are regular in the alphabet Σ, then their union is also a regular set in the alphabet Σ
5 If any two sets are regular in the alphabet Σ, then the set composed of all possible combinations of pairs of their elements is also a regular set in the alphabet Σ
6 If any set is regular in the alphabet Σ, then the set of all possible combinations of its elements is also a regular set in the alphabet Σ
Nothing other than the following is a regular set in the alphabet Σ

See also

  • Building a parser based on the automata approach

Wikimedia Foundation. 2010.

See what “Regular language” is in other dictionaries:

    regular language- - Telecommunications topics, basic concepts EN regular language... Technical Translator's Guide

    - (Latin regularius, from regula rule). Correct, correctly arranged, made. Regular running of the machine. Even stroke. Regular life. Correct, decent, monotonous life. Dictionary of foreign words included in the Russian language.... ... Dictionary of foreign words of the Russian language

    Cm … Dictionary of synonyms

    Ancient written language- A language with long written traditions, that is, it received a written language adapted to the structure of a given language several centuries ago, and the functioning of the written version of the language was not episodic, but regular, with... ... Dictionary of sociolinguistic terms

    This term has other meanings, see Quechua. Quechua Self-name: Qhichwa Simi, Runa Simi Countries ... Wikipedia

    The frame of a building with a grid of columns or posts based on a step of the same size (Bulgarian; Български) uniform skeleton (Czech; Čeština) pravidelný skelet (German; Deutsch) regelmäßiges Skelett (Hungarian; Magyar) szabályos... ... Construction dictionary

    - [FRENCH PARK] a park that has a geometrically regular layout, usually an axial layout (Bulgarian; Български) frenski park (Czech; Čeština) francouzský park (German; Deutsch) regelmäßiger Park; Französischer Park… … Construction dictionary

    Quechua Self-name: Qhichwa Simi, Runa Simi Countries: Argentina, Bolivia, Colombia, Peru, Chile, Ecuador Regions: Andes Official status: Peru ... Wikipedia

    Tagalog language- (tagal, tagala, tagalo; tagalog) one of the Philippine languages. The area of ​​initial distribution is in the most politically, economically and culturally important region of the Republic of the Philippines - the central and southern parts of the island... ... Linguistic encyclopedic dictionary

Books

  • Derived verbs. Secrets of Finnish grammar. Textbook, Safronov V.D.. The manual is devoted to one of the most interesting and insufficiently presented sections of Finnish grammar in Russian-language educational literature - derived verbs. They are formed from verbs and from names...

Regular grammars, finite automata, and regular sets (and the regular expressions that represent them) are three different ways of specifying regular languages.

Statement

A language is PM if and only if it is specified by a left-linear (right-linear) grammar. A language can be defined by a left-linear (right-linear) grammar if and only if it is a regular set.

A language is PM if and only if it is specified by a finite state machine. A language is recognized by a state machine if and only if it is a PM.

All these three methods are equivalent. There are algorithms that allow, for a language defined in one of these ways, to construct another method that defines the same language. A detailed description of these algorithms is available in the literature (see list).

For example, to find a regular expression for a language defined by a right-linear grammar, it is necessary to construct and solve a system of equations with regular coefficients.

In the theory of programming languages, the most important role is played by the equivalence of CA and regular grammars, since such grammars are used to define lexical structures of programming languages. Having created an automaton based on a known grammar, we obtain a recognizer for a given language. In this way, it is possible to solve the parsing problem for lexical constructions of the language.

To construct a CA based on a known regular grammar, it must be reduced to an automaton form. The set of states of the automaton will correspond to the set of non-terminal symbols of the grammar. 2.3.2 Properties of regular languages

A set is called closed under some operation if, as a result of performing this operation on any of its elements, a new element belonging to the same set is obtained.

Regular sets are closed under the operations of intersection, union, addition, iteration, concatenation, changing symbol names, and substituting strings for symbols.

For regular languages, many problems can be solved that are insoluble for other types of languages. For example, the following problems are solvable regardless of how the language is specified:

Equivalence problem: Given two regular languages ​​L 1 (V) and L 2 (V). It is necessary to determine whether they are equivalent.

The problem of chain belonging to language. Given a regular language L(V), a string of symbols V * . We need to check whether this chain belongs to the language.

The problem of the emptiness of language. Given a regular language L(V). It is necessary to check whether this language is empty, i.e. find at least one chain, L(V).

Sometimes it is necessary to prove whether a certain language is regular. If it is possible to specify this language in one of the considered ways, then it is regular. But if such a method cannot be found, it is unknown whether the language is not regular, or whether it was simply not possible to find a way to specify it. There is a simple method for checking whether the language in question is regular. It has been proven that if for a certain language the so-called expansion lemma, then it is regular. If this lemma is not satisfied, the language is not regular.

The growth lemma is formulated as follows. If given a regular language and a sufficiently long chain of symbols belonging to this language, then in it one can find a non-empty subchain that can be repeated as many times as desired, and all chains obtained in this way will also belong to the regular language in question.

Formally, the lemma is written as follows. If a language L is given, then the constant p>0 is such that if L and p, then the chain can be written in the form where 0

Example. Consider the language L=(a n b n n>0). Let us prove that it is not regular using the lemma on the proliferation of languages.

Let this language be regular, then the expansion lemma must hold for it. Let's take some chain of this language = a n b n and write it in the form. If a + or b + , then the string i does not belong to the language for any i, which contradicts the conditions of the lemma. If a + b + , then chain 2 also does not belong to the language L. We have obtained a contradiction, therefore, the language is not regular.

Regular setsand regular expressions

Regular sets

In this section we will consider a class of sets of chains over a finite dictionary, which are very easy to describe by formulas of some kind. These sets are called regular.

Definition 1.Let V 1 And V 2 - many chains. Let's define three operationson these sets.

    Union: V 1 V 2 =(|   V 1 ) or   V 2 .

    Concatenation (product, gluing): Vl V2 = (|  V 1 ,  V 2 ) The sign of the concatenation operation is usually omitted.

Example: V, = (abc, ba),V 2 = (b, cb). V1V2 =(abcb, abccb, bab, bacb).

Let us denote by V n the product of n sets V:V n =VV...V,V° =() (here  is an empty chain).

Example: V 1 = (abc, ba), V 1 2 = (abcabc, abcba, baba, baabc).

3. Iteration: V* = V 0 V 1 V 2 ... =   =0 ∞ V n .

Example: V = (a, bc), V* = (, a, bc, aa, abc, bcbc, bca, aaa, aabc,...).

Definition 4.13.Class of regular sets over a finite dictionary V definegoes like this:

    union ST;

    ST concatenation;

    iteration S* and T*.

5. If a set cannot be constructed by finitely applying rules 1-4, then it is irregular.

Examples of regular sets: (ab, ba)* (aa); (b)((c)(d, ab)*). Examples of irregular sets: (a n b n | n > 0); ( | in chain  the number of occurrences of symbols a and b coincide).

Regular Expressions

Regular sets are good because they can be very simply described by formulas, which we will call regular expressions.

Definition 2.Regular expression class over finite dictionary V definegoes like this:

    their sum R1+R2;

    their product R1R2;

    their iteration R1* and R2*.

4. If an expression is not constructed by applying rules 1-3 finitely, then it is not regular.

The product symbol can be omitted. To reduce the number of parentheses, as in any algebra, operation priorities are used: iteration has the highest priority; the work has less priority; addition has the lowest priority.

Examples of regular expressions: ab + bа*; (aa)*b + (c + dab)*.

Obviously, regular sets and regular expressions are very close. But they represent different entities: a regular set is a set of chains (in the general case infinite), and regular expression isa formula schematically showing how the corresponding regular set was constructed using the operations listed above(and this formula is finite).

Let R^ be a regular set corresponding to the regular expression R. Then:

Thus, a regular expression is a finite formula that defines an infinite number of chains, that is, a language.

Let's look at examples of regular expressions and their corresponding languages.

Regular expression

Corresponding language

All strings starting with b followed by an arbitrary number of characters a

All strings of a and b containing exactly two occurrences of b

All chains of a and b in which the symbols b appear only in pairs

(a+b)*(aa+bb)(a+b)*

All chains of a and b containing at least one pair of adjacent a or b

(0+1)*11001(0+1)*

All chains of 0 and 1 containing subchain 11001

All strings of a and b, starting with a and ending with b

Obviously, a set of strings is regular if and only if it can be represented by a regular expression. However, the same set of chains can be represented by different regular expressions, for example, a set of chains consisting of symbols a and containing at least two a can be represented by the expressions: aa*a; a*aaa*; aaa*; a*aa*aa*, etc.

Definition 3.Two regular expressions R1 And R2 are called equivalent (denoted Rl = R2) then and only whenR1 ^ = R2 ^ .

Thus, aa*a = a*aaa* = aaa* = a*aa*aa*. The question arises how to determine the equivalence of two regular expressions.

Theorem1 . For any regular expressions R, S And T fair:

These relations can be proven by checking the equality of the corresponding sets of chains. They can be used to simplify regular expressions. For example: b (b + aa*b) = b (b + aa*b) = b ( + aa*) b = ba*b. Hence b (b + aa*b) = ba*b, which is not obvious.

Kleene's theorem

Regular expressions are the final formulas that define regular languages. But finite state machines also have a similar property - they also define languages. The question arises: how do the classes of languages ​​defined by finite state machines and regular expressions relate to each other? Let's denote And many automatic languages, R is a set of regular languages. Stephen Kleene, an American mathematician, proved the following theorem.

Theorem2 . (Kleene’s theorem.) The classes of regular sets and automaton languages ​​coincide, that is A = R .

In other words, every automaton language can be specified by a formula (regular expression) and every regular set can be recognized by a finite automaton. We will prove this theorem constructively, in two steps. At the first step, we will prove that any automaton language is a regular set (or, what is the same, for any finite automaton we can construct a regular expression that specifies the language recognized by this automaton). In the second step, we will prove that any regular set is an automaton language (or, what is the same, from any regular expression one can construct a finite automaton that admits exactly the chains of the corresponding regular set).

Let us introduce the transition graph model as a generalization of the finite automaton model. The transition graph has one initial and an arbitrary number of final vertices, and directed edges are marked, unlike a finite automaton, not with symbols, but with regular expressions. The transition graph admits a chain a if A belongs to a set of chains, described by the product of regular expressions R 1 R 2 ...R n , which mark the path from the initial vertex to one of the final vertices. The set of chains allowed by the transition graph forms the language allowed by it.

Rice. 1. Transition graph

In Fig. Figure 1 shows a transition graph that allows, for example, a chain abbca, since the path s->r->p->s->r->q, which leads to the final state q, is marked with a chain of regular expressions ab* c*a. A finite state machine is a special case of a transition graph, and therefore all languages ​​that are accepted by state machines are also supported by transition graphs.

Theorem 3.Every automaton language is a regular set, A  R.

Proof. A transition graph with one initial and one final vertex, in which the only edge from the initial to the final vertex is labeled with a regular expression R, admits the language R^ (Fig. 1).

Rice. 2. Transition graph admitting regular language FT

Let us now prove that every automaton language is a regular set by reducing any transition graph without changing the language it allows to an equivalent form (Fig. 2).

Any finite automaton and any transition graph can always be represented in a normalized form, in which there is only one initial vertex with only outgoing edges and only one final vertex with only incoming edges (Fig. 3).

Rice. 3. Transition graph with one initial and one final vertex

With a transition graph presented in a normalized form, two reduction operations can be performed - edge reduction and vertex reduction - while preserving the language allowed by this transition graph:

a) rib reduction:

B ) vertex reduction (replacement is performed for each path passing through vertex p, followed by its discarding as an unreachable state):

Obviously, each reduction operation does not change the language recognized by the transition graph, but reduces either the number of edges or the number of vertices, and in the end the reductions will bring the transition graph to the form shown in Fig. 2. The theorem is proven: every automaton language is a regular set.

Example

Let the finite machine A be given:

We construct an equivalent transition graph in a normalized form.

Vertex reduction 3:

Reduction of arcs and application of the rule R = R:

Vertex reduction 2:

Reduction of arc and vertex 1:

Thus, the language recognized by automaton A is given by the regular expression: R A = b+(a+bb)(b+ab)*a.

Let us prove Kleene's theorem in the other direction.

Theorem 2.Every regular set is an automaton language: R A.

Proof. Let us show that for each regular expression R a finite automaton A r (possibly non-deterministic) can be constructed that recognizes the language specified by R. We will define such automata recursively.

(the initial and final states of A are combined).

Example(continuation)

Related articles

  • Test “Rus in the 9th – early 11th centuries”

    Task 1. Arrange historical events in chronological order. Write down the numbers that indicate historical events in the correct sequence in the table. The Baptism of Rus' The Calling of the Varangians The Emergence of an Empire...

  • Golovko Alexander Valentinovich

    Alexander Valentinovich Golovko Alexander Valentinovich Golovko Lua error in Module:Wikidata on line 170: attempt to index field "wikibase" (a nil value). Creed: Lua error in Module:Wikidata on line 170: attempt to...

  • Phrases from the joker Phrases from the dark knight

    "The Dark Knight" is a science-fiction thriller filmed in 2008. The high-quality and dynamic film was complemented by an excellent cast. The film stars Heath Ledger, Christian Bale, Maggie Gyllenhaal, Aaron Eckhart, Michael Caine, Morgan Freeman and...

  • Biology - the science of life

    Specifics of biological drawing for middle school students Biological drawing is one of the generally accepted tools for studying biological objects and structures. There are many good tutorials that address this issue....

  • Amino acids necessary for humans How to remember all the amino acids

    1. Amino acids Scarlet Waltz. Flies (from the log) Copper of Farewells, Grass of the Final. Clay Gray, Anxiety, Ceremony, Silence. Slate Depths of Falling Leaves (Fall into) Giant Arcades. That is: Alanine, Valine, Leucine, Isoleucine, Methionine, Proline,...

  • Independent reproduction of Andrea Rossi's cold fusion reactor in Russia

    Owners know firsthand how much it costs to provide a private home with electricity and heat. In this article I want to share the latest news about the development of a new type of heat generator. The likelihood of an energy revolution when...