Extended BNF grammar for Formalized English (FE)
The grammar below is not up-to-date. For example, coreferences
may now be written before or after the quantifier&concept type, e.g.
"Tom the cat", "the cat Tom" and "the cat named Tom" are now allowed.
Unless preceded ",", "and" or "(", each relation must be preceded by
"is", "are", "has" or "have", as in "Tom is on a table that is on a mat
that is near a bed". This is needed to have a LR(1) grammar and also avoids
human misinterpretations about which relation is connected to which concept.
"?" means 0 or 1 times, "*" means 0 to N times, "+" means 1 to N times)
FE := (Tree ("."|"?"))+
Tree := Concept Branches*
QuotedTree := "~"? "`" Tree "'" Context?
Context := "(" Branches2 ")"
Branches := With Relation1 Tree (And With? Relation Tree)*
| With? Relation2 Tree (And With? Relation Tree)*
| "is"("a"|"an")Tree (And "is"("a"|"an")Tree)*
Branches2 := With? Relation Tree (And With? Relation Tree)*
| "is"("a"|"an") Tree (And "is"("a"|"an") Tree)*
With := ("with"|"at"|"has for"|"have for"|"for"|"is"|"are"|
("can"|"may")("be"|"have for")) "the"?
And := "and" | ","
Relation := Relation1 | Relation2
Relation1 := (RelationType|Coreference) "of"? Annotation? Context? "<="?
Relation2 := ("=>" | "<=>" | "<=" ) Concept
| ("=" | "!=" | "<" | "=<" | ">" | ">=" | "or") Concept
RelationType := Term_or_string
Coreference := "*" Term_or_number
Concept := ConceptCore Annotation?
ConceptCore := CorefOrIndiv Quantifier Restrictor CQ?
| Quantifier Restrictor CorefOrIndiv? CQ?
| GroupOf Quantifier? Restrictor CorefDecl? Collection?
| GroupOf Quantifier? CorefDecl? Collection
| (Number | "~"Coreference | CQ | CorefOrIndiv CQ?)
CorefOrIndiv := CorefDecl
| "named"? Term_or_string ("\\" ConceptType)?
//Term_or_string: individual (ex: Tom) or attribute (ex: high)
CorefDecl := "*"Term_or_number
| "*"Term_or_number "!=" "*"Term_or_number
| "*"Term_or_number "!=" Term_or_number
CQ := Collection | QuotedTree
Restrictor := Qualifier? ConceptType
| Qualifier? "[" ConceptType Branches "]"
ConceptType := Term_or_string
Qualifier := "good"|"bad" | "important"|"small"|"big"|"great" | "certain"
Quantifier := "a" | "an" | "some" | "the"
| "any" | "every" | "most" "of"? "the"?
| "at" "least" Number "%"? "of"? "the"?
| "at" "most" Number "%"? "of"? "the"?
| "between" Number "%"? "and" Number "%"? "of"? "the"?
| Number "to" Number "%"? "of"? "the"?
| "from" Number "to" Number "%"? "of"? "the"?
| "mostly" | "several" "of"? "the"?
| Number "%"? "of"? "the"?
| ("many"|"few"|"dozens"|"hundreds"
|"thousands"|"millions"|"billions") "of"? "the"?
GroupOf := CorefOrIndiv?("a"|"the")("group""of" | "bag""of" |
"set""of"|"sequence""of"|"alternative")
| "together"
Collection := "{" (Set|Bag|OrderedSet|OrderedBag|XOR_Set|OR_Bag) "}" CollSize?
Set := Element ("," Element)*
Bag := Element ("&" Element)*
OrderedSet := Element ("<" Element)*
OrderedBag := Element ("=<" Element)*
XOR_Set := Element ("/" Element)*
OR_Bag := Element ("|" Element)*
Element := Concept | "*"
CollSize := "@" Number
Term_or_number:= Term | number
Term_or_string:= Term | string
Term := TermLetter1 TermLetter*
TermLetter1 := [a-z] | "#"[a-z]
TermLetter := [a-z] | "#" | "_" | "-" | "/" | "?" | "&" | "~"
| Digit
| [.?][a-z0-9?#~] //thus "." ok within a term but not at the end
| "://" //thus a URL may be a term
Number := ("+"|"-")? Digit+ ("." Digit* )?
Digit := [0-9]
//Additional notes on the lexical parsing:
- uppercase letters are parsed as if they were lowercase letters
- white spaces and the HTML imbreakable space encoding " " are ignored
- Java/C++ comments ("/* ... */" and "//...") are ignored
- HTML tags are ignored but the content of HTML comments is parsed
- annotations are enclosed within "(^" and "^)"
- strings may be double quoted or enclosed within "$(" and ")$" (because of
the use of quotes and bacquotes to embedd sentences, strings cannot also be
simple quoted in FE)