Parsing & Syntax Analysis

Parsing is like identifying phrases and sentences out of a set of words. Imagine I throw these words at you, how would you interpret them?

COMPILERS ARE GREAT AND SO ARE SANDWORMS

You would read it as Compilers are great and so are sandworms. This is because you know the English grammar and understand that the word and is a conjugating word and it joins phrases or words on each of its sides. Well, what you did was parsing! You formed a sentence with two phrases out of words.

Parsing is the second step of understanding the source code after lexical analysis. It is also called syntax analysis because what we are really doing here is identifying parts of the source code according to the language grammar. Taking an example from Gom, let’s try to parse the following:

return x * x;

The lexer would read this and convert it into a list of tokens as:

[RETURN, IDENTIFIER, MUL, IDENTIFIER, SEMICOLON]

The parser would refer to the Gom grammar and convert this into something like:

returnStatement: Statement {
	expr: BinaryOperation {
		op: Token {
			type: MUL
		},
		lhs: Terminal {
			value: "x"
		},
		rhs: Terminal {
			value: "x"
		}
	}
}

This representation tells us that return x * x; is a return statement with an expression which multiplies two terms (both identifiers). And if you also notice the structure of the object returnStatement, it forms a tree.

Untitled

Parsing:

Top-down
Bottom-up

Top-down parsing:

Recursive descent parsing (general form of top-down)
Predictive parsing (non-recursive), also called LL(k) - left to right input, leftmost derivation