Regular Expression to DFA
Last Updated : 04 Oct, 2024
The main function of regular expressions is to define patterns for matching strings; automata theory provides a structured pattern recognition of these patterns through Finite Automata. A very common method to construct a Deterministic Finite Automaton (DFA) based on any given regular expression is first to construct an NFA and then transform the NFA into the equivalent DFA by the method of subset construction. However, this two-step procedure can be avoided by directly constructing the DFA from the regular expression.
What is DFA?
A DFA is a type of finite automaton such that, for any state and any input symbol, there is exactly one possible transition to a subsequent state. NFAs do not have €-transitions (transitions without the consumption of any input). Because of this determinism, DFAs are an efficient model for pattern recognition tasks because the next state of the automaton is completely determined from the current state and the input symbol at any given point.
Construction of DFA
In order to construct a DFA directly from a regular expression, we need to follow the steps listed below:
Example: Suppose given regular expression r = (a|b)*abb
1. Firstly, we construct the augmented regular expression for the given expression. By concatenating a unique right-end marker '#' to a regular expression r, we give the accepting state for r a transition on '#' making it an important state of the NFA for r#.
So, r' = (a|b)*abb#
2. Then we construct the syntax tree for r#.
Syntax tree for (a|b)*abb#3. Next we need to evaluate four functions nullable, firstpos, lastpos, and followpos.
- nullable(n) is true for a syntax tree node n if and only if the regular expression represented by n has € in its language.
- firstpos(n) gives the set of positions that can match the first symbol of a string generated by the subexpression rooted at n.
- lastpos(n) gives the set of positions that can match the last symbol of a string generated by the subexpression rooted at n.
We refer to an interior node as a cat-node, or-node, or star-node if it is labeled by a concatenation, | or * operator, respectively.
Rules for Computing nullable, firstpos, and lastpos
Node n | nullable(n) | firstpos(n) | lastpos(n) |
---|
n is a leaf node labeled € | true | ∅ | ∅ |
n is a leaf node labelled with position i | false | { i } | { i } |
n is an or node with left child c1 and right child c2 | nullable(c1) or nullable(c2) | firstpos(c1) ∪ firstpos(c2) | lastpos(c1) ∪ lastpos(c2) |
n is a cat node with left child c1 and right child c2 | nullable(c1) and nullable(c2) | If nullable(c1) then firstpos(c1) ∪ firstpos(c2) else firstpos(c1) | If nullable(c2) then lastpos(c2) ∪ lastpos(c1) else lastpos(c2) |
n is a star node with child node c1 | true | firstpos(c1) | lastpos(c1) |
Rules for computing followpos:
- If n is a cat-node with left child c1 and right child c2 and i is a position in lastpos(c1), then all positions in firstpos(c2) are in followpos(i).
- If n is a star-node and i is a position in lastpos(n), then all positions in firstpos(n) are in followpos(i).
- Now that we have seen the rules for computing firstpos and lastpos, we now proceed to calculate the values of the same for the syntax tree of the given regular expression (a|b)*abb#.
firstpos and lastpos for nodes in syntax tree for (a|b)*abb#Let us now compute the followpos bottom up for each node in the syntax tree.
NODE | followpos |
---|
1 | {1, 2, 3} |
---|
2 | {1, 2, 3} |
---|
3 | {4} |
---|
4 | {5} |
---|
5 | {6} |
---|
6 | ∅ |
---|
4.Now we construct Dstates, the set of states of DFA D and Dtran, the transition table for D. The start state of DFA D is firstpos(root) and the accepting states are all those containing the position associated with the endmarker symbol #.
According to our example, the firstpos of the root is {1, 2, 3}. Let this state be A and consider the input symbol a. Positions 1 and 3 are for a, so let B = followpos(1) ∪ followpos(3) = {1, 2, 3, 4}. Since this set has not yet been seen, we set Dtran[A, a] := B.
When we consider input b, we find that out of the positions in A, only 2 is associated with b, thus we consider the set followpos(2) = {1, 2, 3}. Since this set has already been seen before, we do not add it to Dstates but we add the transition Dtran[A, b]:= A.
Continuing like this with the rest of the states, we arrive at the below transition table.
| Input |
---|
State | a | b |
---|
⇢ A | B | A |
---|
B | B | C |
---|
C | B | D |
---|
D | B | A |
---|
Here, A is the start state and D is the accepting state.
5. Finally we draw the DFA for the above transition table.
The final DFA will be :
DFA for (a|b)*abbConclusion
Construction of a DFA from a regular expression is one of the very fundamental processes in automata theory that ties formal languages to practice, such as lexical analysis in compilers. The construction of a DFA from the regular expression avoids taking the middle step of creating the NFA, so the process is much shorter but it does preserve the determinism of the automaton. Understanding how DFAs work also deepens knowledge of formal languages but enhances the implementation of efficient pattern recognition and parsing algorithms in many computer science applications.
Similar Reads
Conversion of Regular Expression to Finite Automata As the regular expressions can be constructed from Finite Automata using the State Elimination Method, the reverse method, state decomposition method can be used to construct Finite Automata from the given regular expressions. Note: This method will construct NFA (with or without ε-transitions, depe
3 min read
How DFA and NFA help for Tokenization of "Regular Expression". Regular expressions (regex) are the universal tools for data pattern matching and processing text. In a widespread way, they are used in different programming languages, various text editors, and even software applications. Tokenization, the process that involves breaking down the text into smaller
8 min read
Design finite automata from regular expressions Prerequisite - Finite automata, Regular expressions, grammar, and language. In this article, we will see some popular regular expressions and how we can convert them to finite automata (NFA and DFA). Let's discuss it one by one. Overview :Let a and b are input symbols and r is the regular expression
3 min read
State Elimination Method convert DFA/NFA/Æ-NFA into Regular Expression State Elimination Method : Rules to convert a DFA/NFA//Æ-NFA into corresponding Regular Expression. Arden's Method is not capable of converting Æ-NFA. By state elimination method you can conveniently and quickly find RE without writing anything just by imagination. Rule-1 : If there are no incoming
3 min read
Right and Left linear Regular Grammars Regular Grammar is a type of grammar that describes a regular language. It is a set of rules used to describe very simple types of languages called regular languages that can be processed by computers easily, especially with finite automata. A regular grammar is a mathematical object, G, which consi
3 min read