Syntactic parsing is a natural language processing technique used to analyze the grammatical structure of a sentence. There are typically two syntactic parsings, dependency parsing and constituency parsing. Fig. 1 shows the parse trees corresponding to dependency parsing and constituency parsing, respectively. Dependency parsing identifies the dependency relationships between the words in a sentence and creates a directed graph representing these dependency relationships. In dependency parsing, each word in the sentence is represented as a node in the graph, and the dependency relationships between the words are represented as edges. The edges are labeled with the type of dependency relationship between the words, such as subject, object, or modifier. The resulting graph is called a dependency tree or a dependency graph. Constituency parsing is the process of analyzing a sentence to identify its syntactic structure and hierarchical organization based on the grammatical rules of a language. In constituency parsing, a sentence is divided into a hierarchy of phrases, each of which has a specific grammatical structure and serves a particular function within the sentence. These phrases are called constituents, and they can include nouns, verbs, adjectives, prepositions, and other parts of speech.
In this project, we aim to propose a simple but effective constituency parsing construction method. The constituency parse tree is first converted to the binary tree where an example is shown in Fig. 2. The core idea behind the method is that once we know the interval height between adjacent words, the binarized constituency parse tree can be constructed [1]. Instead of directly predicting the height, necessitates a complex model for concise prediction, presently, we have trained a binary classifier to compare the height of the intervals pairwisely. Then the height of an interval is calculated as the number of wins. This method simplifies the model significantly. In the future, we aim to improve the model by adopting both the splitting(top-down) and merge(bottom-up) methods.