A Toolkit for Generating Sentences from Context-Free Grammars⋆ Zhiwu Xu, Lixiao Zheng, and Haiming Chen State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, China {zhiwu, zhenglx, chm}@ios.ac.cn Abstract. Producing sentences from a grammar, according to various criteria, is required in many applications. It is also a basic building block for grammar engineering. This paper presents a toolkit for context-free grammars, which mainly consists of several algorithms for sentence generation or enumeration and for coverage analysis for contextfree grammars. The toolkit deals with general context-free grammars. Besides providing implementations of algorithms, the toolkit also provides a simple graphical user interface, through which the user can use the toolkit directly. The toolkit is implemented in Java and is available at http://lcs.ios.ac.cn/~ zhiwu/toolkit.php. In the paper, the overview of the toolkit and the major algorithms implemented in the toolkit are presented, and experimental results and preliminary applications of the toolkit are also contained. 1 Introduction Grammars, especially context-free grammars, are fundamental structures in computer science. Producing sentences from a grammar, according to various coverage criteria or some other constraints, is required in many applications, such as parser/compiler testing, natural language processing, grammar validation, test case generation, bioinformatics, etc. Recently, grammar engineering has been recognized as an emerging field of software engineering due to the fact of lacking solid engineering methods and techniques for grammars on one hand, and due to the importance of grammars in computer science and in other areas on the other hand [1]. Techniques for generating sentences form a basic building block for grammar engineering [2]. Thus a toolkit package for generating sentences is useful. However, presently only very limited abilities to generate sentences are provided in a few grammar tools [3, 4]. On the other hand, there are research issues on sentence generation or enumeration algorithms that need to explore. Based on the research on the algorithms by the authors and others, this paper presents a toolkit for context-free grammars, which mainly consists of several algorithms for sentence generation or enumeration and for coverage analysis for context-free grammars. ⋆ This work was supported by the National Natural Science Foundation of China under Grant Nos. 60573013, 60721061. The commonly used algorithm for sentence generation from context-free grammars is the one proposed by Purdom [5], which produces a small set of sentences satisfying rule coverage. We found out through analysis and experiments that in most cases, Purdom’s algorithm produces too few sentences and the sentences differ too much in length, i.e., some of them are much longer and more complex in structure than others. These sentences are inadequate in many practical applications. To avoid this, an extended algorithm [6] is proposed and implemented which generates more and simpler sentences. A more precise criteria is context-dependent rule coverage [2]. However, as noted in [7], implementation and computation of this metric is more involved. A sentence generation algorithm fulfilling context-dependent rule coverage based on Purdom’s algorithm is proposed and implemented [8], which, to the best of the authors’ knowledge, is the first algorithm for context-dependent rule coverage. An extended algorithm, similar to the idea of the above improvement of Purdom’s algorithm, is also proposed and implemented in the toolkit. The toolkit contains algorithms [9, 10] for sentence and parse tree enumeration for context-free grammars, which are the first linear algorithms and answer the open problem [11] about the existence of such algorithms. The algorithms also lead to several methods of sentence generation for context-free grammars, generally called user-controlled methods in the paper, such as random generation, bounded range enumeration, and structure-sensitive generation of sentences. The toolkit deals with general context-free grammars, which have no restrictions on grammars, and is implemented in Java as Java classes. Besides providing implementations of algorithms, it also provides a simple graphical user interface, through which one can use the toolkit directly. The algorithms implemented in the toolkit have been tested through experiments. The toolkit has also been used in some preliminary applications, which are briefly introduced in the paper. The paper is organized as follows. In the next section we review the basic concepts and notations of context-free grammars. Section 3 gives an overview of the toolkit. Section 4 and Section 5 present the sentence generation and enumeration algorithms contained in the toolkit respectively. Section 6 introduces sentence analysis algorithms contained in the toolkit. Section 7 describes the graphical user interface and presents some examples. Section 8 gives some applications that use our toolkit. Section 9 discusses some related work. Section 10 contains a conclusion. 2 Context-Free Grammar A context-free grammar is a tuple G = hN, T, P, Si, where N is a finite set of nonterminals, T is the alphabet or a finite set of terminals (N ∩ T = ∅), S ∈ N is the start symbol and P ∈ N × (N ∪ T )∗ is a finite set of rules. If (X, u) ∈ P , we write X → u, and call X the left-hand side and u the right-hand side. If there is at least one nonterminal at the right-hand side of rule p, then p is called a nonterminal rule, otherwise a terminal rule. If there are several rules with the same left-hand side X, we write X → u1 | · · · |un . Given α, β ∈ (N ∪ T )∗ , we write α ⇒ β if there exist γ, δ, η ∈ (N ∪ T )∗ and ∗ X ∈ N such that α = γXδ, β = γηδ, and X → η ∈ P . We write α ⇒ β if there exist α0 , . . . , αn (n ≥ 0) such that α = α0 ⇒ . . . ⇒ αn = β. ∗ If S ⇒ α ∈ T ∗ , we say α is a sentence of G. The context-free language defined by G, denoted by L(G), is composed of all sentences of G. Let N = {X1 , . . . , Xm }. Let Li be the language of Gi = hN, T, P, Xi i, i = ∗ ∗ ∗ 1, . . . , m. If there is a derivation S ⇒ αXβ ⇒ αuβ ⇒ w, where X ∈ N , ∗ ∗ α, β ∈ (N ∪ T ) and u, w ∈ T , we say u is a sub-sentence of w. Moreover, p ∗ if S ⇒ u0 X1 u1 . . . Xr ur ⇒ u0 v1 u1 . . . vr ur (= w), where r is the number of nonterminals at the right-hand side of rule p and vi ∈ T ∗ , we say vi is the i-th (from left to right) direct sub-sentence of w. 3 The Toolkit: an Overview In this section, we give an overview of our toolkit. The toolkit contains the following functionalities and algorithms: – Manipulations of context-free grammars • Editing of context-free grammars • Checking the usefulness and reachability of nonterminals – Sentence generation with coverage criteria • Purdom’s algorithm [5] • An extension of Purdom’s algorithm [6] • CDRC-P algorithm [8] • An extension of CDRC-P algorithm – Sentence enumeration • Dong’s algorithms [9, 10] • Sentence generation methods based on sentence enumeration – Sentence analysis • Earley’s algorithm [12] • Coverage analysis algorithms – Graphical user interface (GUI) • Sentence generation panel • Sentence enumeration panel • Sentence analysis panel The toolkit provides an editor for the user to input and edit a context-free grammar. The editor can check whether the grammar is well-defined, that is, the grammar has no useless nor unreachable nonterminals. The toolkit focuses on sentence generation, sentence enumeration and coverage analysis, which constitute the main functionalities of the toolkit. The toolkit also contains a simple graphical user interface, which consists of three panels corresponding to the three main functionalities and through which the user can use the toolkit directly. The toolkit is available at http://lcs.ios.ac.cn/~zhiwu/toolkit.php. The toolkit is implemented in Java as Java classes. It can be used as a Java library. By adding it into the classpath, the user can invoke the algorithms contained in the toolkit in his own programs. The javadoc API specification is available online too. We have used the toolkit to conduct some experiments for the algorithms contained in the toolkit. Some of the experimental results are presented in the paper. To illustrate the algorithms and to describe the graphical user interface, we use the following grammar as the sample grammar throughout the paper. We use angle brackets to distinguish nonterminals from terminals. <S> -> <E> <T> -> <T> * <F> | <F> 4 <E> -> <E> + <T> | <T> <F> -> id | (<E>) Sentence Generation The sentence generation algorithms provided in our toolkit are based on grammar coverage criteria, including rule coverage (RC) [5] and context-dependent rule coverage (CDRC) [2]. For each coverage criterion, two algorithms are implemented, one with a sentence length control mechanism and the other without. We describe these algorithms in the following subsections. 4.1 Sentence Generation with Rule Coverage The notion of rule coverage as a coverage criterion for context-free grammars was introduced by Purdom [5]. Rule coverage simply means that a sentence set explores all the rules of a grammar. Definition 1. Let G = hN, T, P, Si be a context-free grammar. A sentence w ∈ p ∗ L(G) is said to cover a rule p = X → u ∈ P if there is a derivation S ⇒ xXy ⇒ ∗ xuy ⇒ w, where X ∈ N and u, x, y ∈ (N ∪ T )∗ . A sentence set W ⊆ L(G) is said to achieve rule coverage for G, if for each p ∈ P there is a w ∈ W which covers p. Note that the sentence set satisfying rule coverage for a given grammar is not unique. Purdom’s Algorithm Purdom described a fast algorithm for automatically generating a sentence set that achieves rule coverage for a context-free grammar [5]. Purdom’s algorithm takes a context-free grammar as input and produces a small set of sentences such that each rule of the grammar is used at least once. The algorithm proceeds in two distinct phases. The first phase statically collects necessary information from the grammar and stores them in some tables. The information includes: length of the shortest sentence that can be derived from each nonterminal and each rule, length of the shortest sentence which uses a nonterminal nt in its derivation, which rule to use to derive the shortest sentence from a nonterminal nt, which rule to use to introduce a nonterminal nt into the shortest derivation, and so on. The second phase dynamically generates sentences by utilizing the information collected in the first phase. A table known as ONCE calculates the next rule to be used for each nonterminal. The algorithm terminates when all the grammar rules have been exploited. Readers are referred to [5, 13] for a complete interpretation of this algorithm. Example 1. The sentence set generated by Purdom’s algorithm for the sample grammar is {id ∗ (id) + id}, containing one sentence with length1 19. The Extension of Purdom’s Algorithm We found out through analysis and experiments that in most cases, Purdom’s algorithm produces too few sentences and some of them are long and complex, i.e., with complicated derivation structures. These sentences are inadequate in many practical applications. For example, in application to grammar testing or test case generation for the systems based on grammars, such sentences may not be helpful for error location and debugging purposes [6, 14]. To avoid the shortcoming of Purdom’s algorithm, [6] proposed an improved algorithm which still accomplishes the rule coverage goal but generates more and simpler sentences. The algorithm builds upon Purdom’s with two main extensions. First, it calculates a reference length which will be used in the sentence generation process as a reference to control the length of the generated sentence. The choice of the reference length for a given grammar G, denoted as RL(G), is explained as follows. Suppose |W | denotes the maximal sentence length in a set W of sentences. Then RL(G) is defined as the minimal |W | for all W ⊆ L(G) that achieves rule coverage for G. The reference length is calculated by utilizing a data structure clen[p] which represents the length of the shortest sentence covering rule p. It was proved in [6] that the value of RL(G) is exactly the maximal clen[p] for all the rules p of G. clen[p] can be derived from the information already collected in the first phase of Purdom’s original algorithm. Second, in the sentence generation process, a new variable SL is added to forecast the length of the shortest sentence derivable from the current derivation. When choosing a rule to use, it compares this shortest length with the reference length and takes corresponding length control strategies. Interested readers are referred to [6] for a detailed description of this algorithm and its comparison with Purdom’s original algorithm. Example 2. The sentence set generated by the extension of Purdom’s algorithm for the sample grammar is {id + id; (id) ∗ id}, containing two sentences with lengths 10 and 14 respectively. 1 In this paper, length of a sentence is defined to be the number of nodes in its derivation tree, representing the derivation complexity of the sentence. This concept is borrowed from Purdom [5]. 4.2 Sentence Generation with Context-Dependent Rule Coverage Rule coverage explores a grammar’s structure in a relatively weak sense since it considers each rule independently. To achieve more preciseness, Lämmel proposed a generalization of rule coverage such that the context in which a rule is covered is also taken into account. This is known as context-dependent rule coverage [2]. Definition 2. Let G = hN, T, P, Si be a context-free grammar. If Y → uXv ∈ P , where X, Y ∈ N, u, v ∈ (N ∪T )∗ , then Y → u X v is called a direct occurrence of X in G. A sentence w ∈ L(G) is said to cover a rule p = X → z ∈ P for q p ∗ the occurrence Y → u X v if there is a derivation S ⇒ xY y ⇒ xuXvy ⇒ ∗ xuzvy ⇒ w with q = Y → uXv ∈ P . A sentence set W ⊆ L(G) is said to achieve context-dependent rule coverage for G, if all p ∈ P for all occurrences are covered. In our toolkit we provide two algorithms to achieve this coverage. One is based on Purdom’s algorithm, called CDRC-P algorithm [8], and the other extends CDRC-P algorithm with a length control mechanism. CDRC-P Algorithm [8] modified Purdom’s algorithm by changing table ONCE from calculating the next rule to use for each nonterminal to calculating the next rule to use for each direct occurrence of each nonterminal. In the sentence generation process, it records the context (occurrence) of each encountered nonterminal and then consults table ONCE to choose the right rule to rewrite that nonterminal. When all the rules for all the occurrences in the grammar have been covered, the generation process ceases. The sentences generated by this algorithm are in general more complex than those of Purdom’s. Example 3. The sentence set generated by CDRC-P algorithm for the sample grammar is {id ∗ id ∗ ((id) + id ∗ id) + id + id; id}, containing two sentences with lengths 42 and 5 respectively. The Extension of CDRC-P Algorithm As a modified version of Purdom’s algorithm, CDRC-P preserves most of the features of Purdom’s algorithm, i.e., the generated sentences are relatively few and some of them may be rather long and complicated (see example 3). Therefore we adopted the same length control idea as in the extension of Purdom’s algorithm to CDRC-P algorithm so that more and simpler sentences achieving context-dependent rule coverage will be produced. The implementation is similar to that of the extension of Purdom’s algorithm. The only difference is the choice of the reference length. As explained in section 4.1, the reference length for Purdom’s algorithm is selected to be the minimal |W | for all W ⊆ L(G) that achieves rule coverage for G. Accordingly, here we choose the reference length for CDRC-P algorithm to be the minimal |W | for all W ⊆ L(G) that achieves context-dependent rule coverage for G. Similiarly Table 1. Sentences generated by (CDRC-P / the extension) algorithm. Grammar bExp polishExp elemFunc miniPascal ANSI C Java No. of rules 9 12 38 81 213 282 No. of sentences 2/8 2/3 2/35 4/19 75/149 222/471 Avg len. of sentences 32.0/16.1 18.5/16.0 256.5/28.5 191.0/65.9 115.2/87.5 85.4/62.5 Min. of lengths 9/13 5/15 8/12 23/23 8/8 6/6 Max. of lengths 55/19 32/17 505/33 329/82 2131/148 4889/102 to that in [6], the value of RL(G) can be proved to be the maximal oclen[p, o] where oclen[p, o] represents the length of the shortest sentence covering rule p for the occurrence o. Again oclen[p, o] can be calculated by utilizing clen[p] along with other information collected in the first phase of Purdom’s algorithm. Here we omit the details of the calculation due to space limitations. Example 4. The sentence set generated by the extension of CDRC-P algorithm for the sample grammar is {id + id + id; (id) ∗ id ∗ id; (id) ∗ id + id; (id + id) ∗ id; id ∗ (id) + id; id; id + id ∗ id}, containing seven sentences with lengths 15, 18, 19, 19, 19, 5 and 14 respectively. To illustrate the features of the two algorithms with context-dependent rule coverage and their differences, we present the results of our experiments conducted on a few grammars, as shown in Table 1. The first five grammars are the same as those used in [6]. The Java grammar is retrieved from the comp.compilers FTP at ftp://ftp.iecc.com/pub/file/. For each grammar, we recorded the number of sentences, as well as the average, minimal and maximal length of the sentences generated by each algorithm. The number of rules for each grammar is also presented for comparison of grammar size with the number of generated sentences. 5 Sentence Enumeration The sentence enumeration algorithm provided in our toolkit is the one proposed by Dong [9, 10], which is the first linear algorithm for general context-free grammars and answers the open problem [11] about the existence of such algorithm. Dong’s algorithm enumerates the sentences of a general context-free grammar in hierarchical lexicographic order. Moreover, Dong’s algorithm can be easily extended to enumerate the parse trees of a general context-free grammar [10], which is also contained in the toolkit. In this section, we first introduce Dong’s algorithm (5.1), then we discuss the implementation (5.2) and the extension to sentence generation (5.3). 5.1 Introduction to Dong’s Algorithm According to the height of the parse tree2 , the set of sentences of a contextfree grammar is partitioned into (in)finite hierarchies, such that sentences of the same height belong to the same hierarchy. The 0-th hierarchy consists of sentences of height 1, which is determined by the grammar directly. The n-th (n > 0) hierarchy consists of sentences of height n + 1, which is constructed by utilizing the previous well-constructed hierarchies mechanically. Hierarchical Lexicographic Order To partition the sentences into hierarchies, Dong’s algorithm constructs four different kinds of sentence sets: – heapin , containing the sentences in Li whose height is no more than n + 1, – hierarchyni , containing the sentences of height n + 1 in Li , – clusternp , containing the sentences of height n + 1 using nonterminal rule p as the first one in the derivation, p – cubep,t n , containing the sentences in clustern whose t-th (from right to left) direct sub-sentence is from the (n − 1)-th hierarchy. The constructions of these four sets are described by Equations (1-5). i heapin = ∪n j=0 hierarchyj ( {ω|∃a terminal rule p̄ = Xi → ω} hierarchyni = p i ∪lj=1 clusternj ( cubep,r n = 1, 1 clusternp = p,1 p,r cuben ∪ · · · ∪ cuben n>1 (1) n = 0, n>0 i i1 r−t+1 r ) , · · · , heapin−1 cubep,t n = cubep (heapn−2 , · · · , hierarchyn−1 cubep (S1 , · · · , Sr ) = {u0 v1 u1 · · · vr ur |v1 ∈ S1 , . . . , vr ∈ Sr } (2) (3) (4) (5) where n is the hierarchy index, i is the nonterminal index, li is the number of the nonterminal rules whose left-hand side is Xi , p is the nonterminal rule of the form Xi → u0 Xi1 u1 · · · Xir ur , r is the number of the nonterminals at the right-hand side of rule p, t ∈ [1, r] is the cube index and S1 ⊆ Li1 , . . . , Sr ⊆ Lir . Definition 3. Let s be a sentence set. The volume data |s| of the set s is the number of sentences contained in s. Corresponding to the four kinds of sentence sets, there are four kinds of volume data: heap volume data, hierarchy volume data, cluster volume data and cube volume data. Consider the sample grammar. The hierarchy volume data 2 In the paper, the height of a parse tree is defined as the maximum path length which is the number of nonterminals in a path traversed from the root to the leaf node. from the 0-th hierarchy to the 5-th hierarchy are shown in Table 2. Since there is no sentence of height 1 derived from nonterminals S, E and T , their hierarchy volume data on the 0-th hierarchy all equal to 0. While the hierarchy volume data |hierarchy0F | is 1 because there is one and only one sentence of height 1 (id) derived from F . According to Equations (1-5), it is not difficult to calculate the volume data. Table 2. Hierarchy Volume Data from the 0-th to the 5-th Hierarchy Hierarchies 0 1 2 3 4 5 S 0 0 0 1 3 11 E 0 0 1 3 11 113 T 0 1 1 1 5 37 F 1 0 0 1 3 11 Apparently, according to Equations (1-5), the hierarchies in Li are disjoint finite sets, and every hierarchy (except the 0-th hierarchy) consists of some disjoint clusters and every cluster consists of some disjoint cubes. Thus, Dong’s algorithm sorts the set of sentences by the following rules: 1. 2. 3. 4. 5. hierarchy0i ≺ · · · ≺ hierarchyni , hierarchy0i is sorted by lexicographic order, pl clusternp1 ≺ · · · ≺ clustern i (n > 0), p,1 p,r cuben ≺ · · · ≺ cuben (n > 1), s, s̃ ∈ cubep (S1 , · · · , Sr ): s < s̃, iff ∃t ∈ [1, r], v1 = ṽ1 , . . . , vt−1 = ṽt−1 , vt < ṽt , where for two disjoint set S1 and S2 , the inequality S1 ≺ S2 means that ∀s1 ∈ S1 , ∀s2 ∈ S2 , s1 < s2 holds, and the other notations refer to Equations (1-5). Therefore, the set of sentences is well-ordered. The order relation stated above is called hierarchical lexicographic order. Enumeration Procedures The most important enumeration procedure is the next procedure, which generates the next sentence of a given sentence. There are four possible different cases for a given sentence and its next sentence: 1. 2. 3. 4. they they they they are are are are in in in in different hierarchies, the same hierarchy but different clusters, the same cluster but different cubes, the same cube. Two modes are presented for this procedure: bottom-up or top-down. The topdown mode runs from the different hierarchies case to the same cube case, while the bottom-up one does reversely. The time complexities of them are both O(n). But in practice, the time needed by the bottom-up one is usually less than that needed by the top-down one, as the same cube case occurs more frequently. The previous procedure, which generates the previous sentence of a given sentence, is similar to the next one and we do not discuss it here. Since there exists a one-to-one correspondence between the set of sentences and the set of natural numbers, there exist two procedures for the conversion between sentences and natural numbers, namely N2L and L2N respectively. The N2L procedure converts a natural number to the corresponding sentence, while the L2N procedure does the opposite. Note that when computing a natural number from a given sentence, it needs to make sure that the sentence belongs to the grammar. 5.2 Implementation In our toolkit, two implementations are provided for Dong’s algorithm: one with the specific volume data and the other without. There is essentially no difference between the two implementations except the type of volume data. The first one uses the BigInteger type to compute and store the specific volume data, while the other one uses the Boolean type to denote the emptiness of the volume data. Of course, the first one needs more time and space than the other in practice. Both implementations can be used to compute sentences one by one and to enumerate sentences in order. While with the specific volume data, the conversion between sentences and natural numbers and the top-down version of next procedure can be implemented. Therefore, we use the one with the specific volume data in the graphical user interface. With this one, we have implemented three functionalities for the interface. The first one is to query the volume data for an arbitrary hierarchy. The second one is to do some operations about a sentence, such as generating the next or previous sentence for a given sentence and converting between sentences and natural numbers. The last one is to enumerate a bounded range of sentences, which can also be restricted in some hierarchy, some cluster or some cube. In a word, with the specific volume data, it is convenient for us to implement more functionalities. Using the toolkit, we conducted some experiments for Dong’s algorithm. One example is on the sample grammar. We enumerated the first 1000 sentences of each hierarchy from the 7-th hierarchy to the 15-th hierarchy (the numbers were selected arbitrarily). We chose the top-down mode to generate the next sentence. The experiments were performed on a PC of P4 3.2G CPU and 2G RAM. These experimental results are given in Figure 1, which shows that the real computing time for generating the next sentence of height n is O(n). 5.3 Sentence Generation Dong’s algorithm can be used to implement several methods of sentence generation, generally called user-controlled methods in the paper, such as random generation, bounded range enumeration, and structure-sensitive generation of sentences. With a random number generator, Dong’s algorithm can generate Fig. 1. The total time of enumerating sentences on the sample grammar sentences randomly by using N2L. Given a bounded range indicated by, for example, two numbers, Dong’s algorithm can enumerate all the sentences in this range. Hierarchy, cluster, and cube can also be used to specify restrictions on sentence generation, which is called structure-sensitive generation. For example, the sentences generated from some cube have a similar structure. The structuresensitive generation and the two sentence generations above are orthogonal. This indicates that we can generate some random sentences or a bounded range of sentences in some hierarchy, cluster or cube. Further, Dong’s algorithm can also be used for sentence generation with coverage criteria. To exploit some rule p, it needs to generate such a sentence that contains a sub-sentence from the cluster clusternp . Similarly, to exploit a rule p = X → z for the occurrence q = Y → u X v (see Definition 2), it needs to generate such a sentence that contains a sub-sentence from those sentences, which are in the cluster clusternq and whose t-th direct sub-sentence (assume that X is the t-th nonterminal at the right-hand side of rule q) is from the p X cluster clusterm instead of the hierarchy hierarchyn−1 or the heap heapX n−1 X (or heapn−2 ). However, we found out through analysis and experiments that it will be time-consuming to generate a sentence set that satisfies rule coverage or context-dependent rule coverage by enumeration starting from the first sentence, as it produces a lot of sentences that repeatedly cover already-used rules. It likely needs to collect some effective information as that used in Purdom’s algorithm, such as which rule to use to introduce a nonterminal into the shortest derivation. We will consider this problem in the future. Some of the controlled methods have been implemented in the toolkit. Through the interface, the user can specify to enumerate sentences in a bounded range by giving two numbers. The user can also restrict the range in some hierarchy, some cluster or some cube by giving their indices. 6 Sentence Analysis There are several parsing algorithms for determining whether a sentence belongs to a grammar. As our toolkit deals with general context-free grammars, we use Earley’s algorithm [12] for sentence parsing. Earley’s algorithm is an efficient parsing algorithm for general context-free grammars. We implemented Earley’s algorithm in our toolkit as follows. Given a grammar and a sentence, if the sentence belongs to the grammar, the toolkit extracts all possible parses of the sentence. This is useful for checking sentence ambiguity. Based on Earley’s algorithm, the coverage of a sentence set, including rule coverage and context-dependent rule coverage, can be analyzed. We implemented these two coverage analysis algorithms in our toolkit, which are described in the following. If a sentence set does not satisfy rule coverage, it would not satisfy context-dependent rule coverage either. The rule coverage analysis algorithm uses a table named as MARK to indicate whether a rule is exploited. To start with, the algorithm initializes all the cells of the table MARK as false, that is, all the rules are unexploited. Then the algorithm parses each sentence in the set using Earley’s algorithm and marks all the exploited rules true in table MARK. Once some sentence is rejected by the grammar, the algorithm exits by marking the sentence unaccepted. When the parsing of the sentence set ends, it checks whether all the rules have been exploited according to the table MARK. The context-dependent rule coverage analysis algorithm is similar to the one above and can be modified from it by changing marking the exploited rules to marking the exploited occurrences (see Definition 2). In addition, if the given grammar is ambiguous, both algorithms pick only one of the derivations of sentence for coverage analysis. Therefore, in this case, a sentence set generated according to some criterion may be stated not to satisfy the same criterion. 7 Graphical User Interface In this section, we describe the functionalities of the graphical user interface provided in our toolkit and present some examples. The interface consists of three panels, which correspond to the three main functionalities: sentence generation, sentence enumeration and sentence analysis. Each of these panels is implemented independently as a Java class, so that each of them can be used independently. A general view of the interface is presented in Figure 2. 7.1 Editor and Console Each panel contains a grammar editor which enables the user to edit, load or save a grammar and to check whether a grammar is well-defined, and an output console which outputs the outcomes and extra information like sentence length, running time and so on. Fig. 2. Editing of the sample grammar The grammar rules are written as the following form: <X>−>α1| . . . |αn , where nonterminals are bracketed by angle brackets to be distinguished from terminals. Once the grammar rules are input, the editor will check whether the grammar is well-defined. Figure 2 shows an example of editing the sample grammar. By default, the start symbol corresponds to the left-hand side of the first rule. The user can also indicate the start symbol through the “Start” dropdown box. 7.2 Sentence Generation Panel The sentence generation panel allows the user to generate a sentence set that satisfies rule coverage or context-dependent rule coverage. A length control mechanism can as well be configured on in the sentence generation process by marking the “Length Control” check box. After generation, the number of generated sentences, the information of sentences’ lengths, and the total time for generating sentences are also counted. For instance, we generate a sentence set that satisfies rule coverage from the sample grammar with the length control. The result is shown in Figure 3. Fig. 3. Sentence generation with rule coverage and length control 7.3 Sentence Enumeration Panel The sentence enumeration panel consists of three tabs. Figure 5 shows a general view of the panel. The first tab enables the user to query the volume data among a bounded hierarchical range by giving the “From” hierarchy index and the “To” hierarchy index. It also allows the user to choose which kinds of volume data to query by marking the corresponding check boxes. For example, Figure 4 shows the result about querying the hierarchy and heap volume data of the sample grammar from the 5-th hierarchy to the 7-th hierarchy. Fig. 4. The hierarchy and heap volume data from the 5-th to 7-th hierarchy The second tab allows the generation of the previous or next sentence for a given sentence and the conversion between sentences and natural numbers. There are two modes, bottom-up and top-down, for computing the next or previous sentence of a given sentence. Through the “Next Mode” dropdown box, the user can choose either he likes. According to hierarchical lexicographic order [9, 10], there exists a one-to-one correspondence between natural numbers and sentences. Therefore, given a sentence, we can compute its sequence number and vice versa. An example is to compute the 100-th sentence of the sample grammar, showed by Figure 5. Fig. 5. Computing the 100th sentence The third tab allows the user to enumerate a bounded range of sentences by giving the sequence numbers of the first and last sentences. The range can be restricted in a hierarchy, cluster or cube, by giving the corresponding indices in which case only the sentences in the intersection of the range and the restriction are enumerated. The hierarchies, clusters and cubes are used for expert users. For normal users, we will consider how to give the restriction by giving some rules or structures in the future. After enumeration, the total time for enumerating sentences, the actual number of enumerated sentences and the average time for enumerating a sentence are also counted. For instance, we enumerate the sentences in the range [0,100] in the 5-th hierarchy for the sample grammar. Since the 5-th hierarchy volume data is 11 (< 100), only 11 sentences are enumerated. The result is given in Figure 6. Fig. 6. Enumerating the sentences in [0,100] in the 5-th hierarchy When enumerating a bounded range of sentences, the console does not output all the sentences. Additionally, the toolkit allows the user to export the enumerated sentences into a file if he sets the “File” checkbox on. The output file contains the corresponding sequence number and parse tree for each enumerated sentence, as well as the statistic information mentioned above. Figure 7 presents the output file of the above example (See Figure 6). 7.4 Sentence Analysis Panel Through the sentence analysis panel, the user can parse a sentence or analyze the coverage of a sentence set, including rule coverage and context-dependent rule coverage. Since we have generated a sentence set from the sample grammar with rule coverage and length control (see Figure 3), we parse this sentence set as an instance. Figure 8 shows the result of the coverage analysis for the sentence set. 8 Applications The toolkit has been used in several research projects, namely SAQ [15], grammar testing methods [16], and testing of LFC. Fig. 7. The output file containing the objects in [0,100] in the 5-th hierarchy Fig. 8. Analyzing the coverage of a sentence set generated from the sample grammar Table 3. Results for fault detection using CDRC-P algorithm. Programs Name LoC3 bEval 8 pEval 11 Diff 34 Diff2 46 idCount 60 Total 23 122 130 273 650 Mutants Killed 9 104 120 118 516 Ratio 39.1% 85.3% 92.3% 43.2% 79.4% SAQ supports construction and validation of specifications consisting of contextfree grammars and operations upon context-free languages. Initially SAQ supports grammar validation but quite weak [15]. We enhanced it by adding several sentence generation and enumeration algorithms contained in this toolkit. In the experiments, several errors in a number of grammars in SAQ bases were found. It is observed that the bounded range enumeration is very useful for this kind of grammar validation. The second project is an extension of the research of the grammar validation of SAQ, which aims at systematic methods and techniques for testing contextfree grammars. A systematic framework for testing context-free grammars is implemented using the algorithms in the toolkit [16]. From experiments some errors were found in the C and Java grammars that are from comp.compilers FTP at ftp://ftp.iecc.com/pub/file/, see [16] for detail. LFC is a language based on recursive functions on context-free languages [17– 19]. It uses context-free grammars to specify data types, and supports pattern matching definition of functions, where patterns represent structures of the grammars. LFC was designed to support formal specification acquisition and was used in SAQ. Test data for LFC programs can be derived directly by using the sentence generation algorithms provided by this toolkit. We have conducted preliminary experiments to investigate the usefulness of the generated test data in terms of fault detection capability. In particular, we took CDRC-P algorithm as test data generator and conducted mutation testing [20] on five LFC programs, which are taken from SAQ. We applied five mutation operators [21] to each program and compared the result of each mutant with the original result to check whether a mutant is killed by the test data. Experimental results are summarized in Table 3. From the last column we can see that the ratio of the killed mutants to the total mutants is above 30% for all the five programs, which reflects the relatively good capability for fault detection of the test data generated by the CDRC-P algorithm in this toolkit. Furthermore, there are situations where sentences that satisfy some properties are required. In these cases we can enumerate sentences in a bounded range and select valid sentences w.r.t. the properties from them. 3 Line of Codes 9 Related Work Grammar Tool Bilska et al. [22] presented a collection of tools for formal languages and automata theory. The tools include JFLAP for creating and simulating finite automata, pushdown automata and Turing machines, PumpLemma for proving specific languages are not regular, and some others. Møler [23] implemented a grammar tool which contains a parser and an ambiguity analyzer for context-free grammars. The parser parses a sentence without giving parses. This tool is used mainly for analyzing grammar ambiguity. Almeida et al. [24] presented an interactive graphical environment CGM for the manipulation of context-free languages. It allows the editing of context-free grammars, the conversion to Chomsky normal form, sentence parsing and the construction of parse trees. The above tools do not provide sentence generation, sentence enumeration, or coverage analysis as our toolkit. Programming Languages Laboratory at University of Calgary provided an online tool Context Free Grammar Checker [3] to check the basic properties of context free grammars, including the usefulness and reachability of the nonterminals, the first sets, the follow sets and so on. After checking, it generates no more than 20 sentences, which are the first 20 ones ordered by the sentence length (token number). Compared with ours, its sentence generation is too simple. Another related work is the sentence generation tool Forson [4], implemented by Alfonso Tarantini. Forson takes a Bison grammar file as input and provides random sentence generation, following Grow’s algorithm (may not terminate), or sentence generation with rule coverage, following Purdom’s algorithm, for the grammar defined in the input file. It only covers a small part of the functionalities of our toolkit. Moreover, Forson is a batch program, which means that it does not provide a graphical user interface. Sentence Enumeration Algorithm Mäkinen [25] presented a linear algorithm to enumerate the sentences in lexicographic order, but only for some subclass of context-free grammars. Dömösi [11] proposed an algorithm for general context-free grammars and Huang [26] presented a modification of Dömösi’s algorithm which is also for general context-free grammars. But neither of them are linear. Dömösi’s algorithm generates the next sentence of length (token number) n in time O(n4 ), while Huang’s modification runs in time O(n4 ) for general grammars and O(n3 ) for unambiguous grammars. 10 Conclusion The paper presents a toolkit for generating sentences from context-free grammars. The toolkit supports sentence generation with coverage criteria, sentence enumeration and sentence analysis. It also contains a simple graphical user interface, through which the user can use the toolkit directly. The toolkit provides richer functionalities for sentence generation, enumeration and analysis than existing tools, thus is expected to have more applications. In the future, we plan to study the unresolved problems mentioned in the paper and further enrich the toolkit by adding more functionalities, such as a graphical parse tree editor. Acknowledgment The authors thank Yunmei Dong for his suggestions on developing the toolkit and on implementing, improving and experimenting with the sentence enumeration algorithm and its GUI. References 1. Klint, P., Lämmel, R., Verhoef, C.: Towards an engineering discipline for grammarware. ACM Transaction on Software Engineering and Methodology 14(3) (2005) 331–380 2. Lämmel, R.: Grammar testing. In: Proceedings of Fundamental Approaches to Software Engineering (FASE’01). Volume 2029. (2001) 201–216 3. Programming Languages Laboratory, U.o.C.: Context free grammar checker. http: //smlweb.cpsc.ucalgary.ca/ 4. Tarantini, A.: Forson. http://sourceforge.net/projects/forson/ 5. Purdom, P.: A sentence generator for testing parsers. BIT 12(3) (1972) 366–375 6. Zheng, L., Wu, D.: A sentence generation algorithm for testing grammars. In: Proceedings of the 33rd Annual International Computer Software and Applications Conference (COMPSAC’09). Volume 1. (2009) 130–135 7. Alves, T.L., Visser, J.: A case study in grammar engineering. In: Proceedings of the 1st International Conference on Software Language Engineering (SLE’08). Volume 5452. (2008) 285–304 8. Shen, Y., Chen, H.: Sentence generation based on context-dependent rule coverage. Computer Engineering and Applications 41(17) (2005) 96–100 In Chinese. 9. Dong, Y.: Counting and hierarchical lexicographic enumeration of CFL sentences. Science in China (Series E) 36 (2006) 1375–1413 In Chinese. 10. Dong, Y.: Linear algorithm for lexicographic enumeration of CFG parse trees. Science in China (Series F - Information Science) 52(7) (2009) 1177–1202 11. Dömösi, P.: Unusual algorithms for lexicographical enumeration. Acta Cybernet 14(3) (2000) 461–468 12. Earley, J.: An efficient context-free parsing algorithm. Communications of the ACM 13(2) (1970) 94–102 13. Malloy, B.A., Power, J.F.: An interpretation of purdom’s algorithm for automatic generation of test cases. In: Proceedings of the 1st International Conference on Computer and Information Science (ICIS’01). (2001) 3–5 14. Woo1, G., Chae, H.S., Jang, H.: An intermediate representation approach to reducing test suites for retargeted compilers. In: Proceedings of the 12th International Conference on Reliable Software Technologies. (2007) 100–113 15. Dong, Y., Li, K., Chen, H., Hu, Y., Zhang, R., Tang, R.: Design and implementation of the formal specification acquisition system saq. In: Proceedings of the Conference on Software: Theory and Practice, IFIP 16th World Computer Congress 2000. (2000) 201–211 16. Zheng, L., Chen, H.: A systematic framework for grammar testing. In: Proceedings of the 8th International Conference on Computer and Information Science (ICIS’09). (2009) 1013–1019 17. Dong, Y.: Recursive functions of context free languages (I)-the definitions of CFPRF and CFRF. Science in China (Series F) 45(1) (2002) 25–39 18. Dong, Y.: Recursive functions of context free languages (II)-validity of CFPRF and CFRF definitions. Science in China (Series F) 45(2) (2002) 1–21 19. Chen, H., Dong, Y.: Towards practical computable functions on context-free languages. In: Proceedings of the 4th Annual Conference on Theory and Applications of Models of Computation(TAMC’06). Volume 3959. (2006) 555–565 20. Richard A. DeMillo, Richard J. Lipton, F.G.S.: Hints on test data selection: Help for the practicing programmer. (1978) 21. Offutt, A.J., Lee, A., Rothermel, G., Untch, R.H., Zapf, C.: An experimental determination of sufficient mutant operators. ACM Transactions on Software Engineering Methodology 5 (1996) 99–118 22. Bilska, A.O., Leider, K.H., Procopiuc, M., Procopiuc, O., Rodger, S.H., Salemme, J.R., Tsang, E.: A collection of tools for making automata theory and formal languages come alive. In: ACM SIGCSE Bulletin. (1997) 15–19 23. Møler, A.: dk.brics.grammar. http://www.brics.dk/grammar/ (2008) 24. Almeida, A., Alves, J., Moreira, N., Reis, R.: CGM: A context-free grammar manipulator. In: Proceedings of Compiler, Related Technologies and Applications (CoRTA’08). (2008) 25. Mäkinen, E.: On lexicographic enumeration of regular and context-free languages. Acta Cybernet 13 (1997) 55–61 26. Huang, W.: Enumerating sentences of context free language based on first one in order. Journal of Computer Ressearch and Development 41(1) (2004) 9–14 In Chinese.

© Copyright 2021 Paperzz