【背景】
折腾:
【记录】将antlr v2的C/C++的preprocess,即cpp.g,转换为antlr v3
期间,参考之前antlr v2的代码:
IDENTIFIER @init{
List define = new ArrayList();
List foundArgs = new ArrayList();
String callArg0Text = "";
String callArg1Text = "";
} :
identifier=RAW_IDENTIFIER
{
// see if this is a macro argument
define = (List)defineArgs.get(identifier.getText());
if (define==null) {
// see if this is a macro call
define = (List)defines.get(identifier.getText());
}
}
( { (define!=null) && (define.size()>1) }? (WS|COMMENT)?
// take in arguments if macro call requires them
'('
callArg0=EXPR
{
callArg0Text = callArg0.getText();
foundArgs.add(callArg0Text);
}
( COMMA callArg1=EXPR
{
callArg1Text = callArg1.getText();
foundArgs.add(callArg1Text);
}
)*
{ foundArgs.size()==define.size()-1 }? // better have right amount
')'
| { !((define!=null) && (define.size()>1)) }?
)去实现匹配define被调用的时候或者是普通的ID。
其中,后来看懂了,是通过:
{ (define!=null) && (define.size()>1) }?去实现,条件性的匹配的,即当define不为空,且size大于1,然后才继续后面的匹配:
(WS|COMMENT)?
// take in arguments if macro call requires them
'('
callArg0=EXPR
{
callArg0Text = callArg0.getText();
foundArgs.add(callArg0Text);
}
( COMMA callArg1=EXPR
{
callArg1Text = callArg1.getText();
foundArgs.add(callArg1Text);
}
)*
{ foundArgs.size()==define.size()-1 }? // better have right amount
')'而如果不满足该条件,则才匹配或运算符’|’后面的:
{ !((define!=null) && (define.size()>1)) }?【解决过程】
1.所以,问题转化为,在antlr v3中,如何实现lexer中的条件性匹配。
2.这人:
遇到的问题,和我这里类似。
虽然没有直接的参考答案,但是其中提到了:
- ({boolExpr}?):叫做消除二义性/验证性的语法预测disambiguating/validating semantic predicate
- ({boolExpr}?=>):才是所需要的gated semantic predicate
其中的代码:
fragment VERSION_COMMENT_TAIL[bool matches_version]:
{!matches_version}? => ( options { greedy = false; }: . )* '*' '/' { $type = MULTILINE_COMMENT; $channel = 98; }
| { $type = VERSION_COMMENT; $channel = 98; }
; 给了提示,说明是
{xxx}? => yyy{do_A} | {do_B}
的形式。
和此处很类似。
3.对于此,官网:
http://www.antlr2.org/doc/lexer.html
即antlr v2中的相关解释是:
DEFINE
: {getColumn()==1}? "#define" ID
;
Semantic predicates on the left-edge of single-alternative lexical rules get hoisted into the nextToken prediction mechanism. Adding the predicate to a rule makes it so that it is not a candidate for recognition until the predicate evaluates to true. In this case, the method for DEFINE would never be entered, even if the lookahead predicted #define, if the column > 1. |
也是符合预期的,即:
对于
{xxx}? => yyy{do_A}
中的表达式xxx,如果xxx不满足的话,则是不会去匹配对应的内容的。且一直不会去匹配的,直到找到匹配的。
而不是原先所要的效果:
希望当xxx不满足,则就不去匹配 -> 而去匹配或者关系后面的内容。
4.也参考了antlr v4的官网:
|
和:
|
但是还是没有理解透彻。
因为在antlr v3中,对应的语法:
( { (define!=null) && (define.size()>1) }? (WS|COMMENT)?所产生的java代码是:
switch (alt18) {
case 1 :
// D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:174:7: {...}? ( WS | COMMENT )? '(' callArg0= EXPR ( COMMA callArg1= EXPR )* {...}? ')'
{
if ( !(( (define!=null) && (define.size()>1) )) ) {
throw new FailedPredicateException(input, "IDENTIFIER", " (define!=null) && (define.size()>1) ");
}
// D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:174:48: ( WS | COMMENT )?
int alt16=3;
int LA16_0 = input.LA(1);很明显,是一旦遇到,不满足此处判断:
| (define!=null) && (define.size()>1) |
就会抛出异常,而不会继续执行下去的,
不会像预期的,继续去判断和匹配,或运算符’|’后面的内容:
| { !((define!=null) && (define.size()>1)) }? |
的。所以很是奇怪。
5.去把两者顺序调换一下,变为:
( { !((define!=null) && (define.size()>1)) }?
|
{ (define!=null) && (define.size()>1) }? (WS|COMMENT)?
// take in arguments if macro call requires them
'('
callArg0=EXPR
{
callArg0Text = callArg0.getText();
foundArgs.add(callArg0Text);
}
( COMMA callArg1=EXPR
{
callArg1Text = callArg1.getText();
foundArgs.add(callArg1Text);
}
)*
{ foundArgs.size()==define.size()-1 }? // better have right amount
')'
)试试效果,结果还是无法解决问题。还是原先的效果:
虽然可以跳过了:
switch (alt18) {
case 1 :
// D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:174:7: {...}?
{
if ( !(( !((define!=null) && (define.size()>1)) )) ) {
throw new FailedPredicateException(input, "IDENTIFIER", " !((define!=null) && (define.size()>1)) ");
}
}
break;但是对于后面的代码:
if (define!=null) {
String defineText = (String)define.get(0);
if (define.size()==1) {
//only have one value in list -> the defineText is the define para content -> just need replace directly
setText(defineText);
} else {
//add new dict pair: (para, call value)
for (int i=0;i<foundArgs.size();++i) {
// treat macro arguments similar to local defines
List arg = new ArrayList();
arg.add((String)foundArgs.get(i));
defineArgs.put( (String)define.get(1+i), arg );
}
// save current lexer's state
SaveStruct ss = new SaveStruct(input);
includes.push(ss);
// switch on new input stream
setCharStream(new ANTLRStringStream(defineText));
reset();
}
}还是无法执行,因为define的确是null。
所以,还是暂时没解决,antlr v3的选择性匹配的问题。
6.参考:
Forcing an alternative in ANTLR lexer rule
去改为 => 的格式的语法:
({ (define!=null) && (define.size()>1) }?=> (WS|COMMENT)?
// take in arguments if macro call requires them
'('
callArg0=EXPR
{
callArg0Text = callArg0.getText();
foundArgs.add(callArg0Text);
}
( COMMA callArg1=EXPR
{
callArg1Text = callArg1.getText();
foundArgs.add(callArg1Text);
}
)*
{ foundArgs.size()==define.size()-1 }? // better have right amount
')'
| { !((define!=null) && (define.size()>1)) }?=>
)试试,结果生成的代码还是:
if ( ((LA18_0 >= '\t' && LA18_0 <= '\n')||LA18_0=='\r'||LA18_0==' '||LA18_0=='('||LA18_0=='/') && (((define!=null) && (define.size()>1)))) {
alt18=1;
}
switch (alt18) {
case 1 :
// D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:174:7: {...}? => ( WS | COMMENT )? '(' callArg0= EXPR ( COMMA callArg1= EXPR )* {...}? ')'
{
if ( !(((define!=null) && (define.size()>1))) ) {
throw new FailedPredicateException(input, "IDENTIFIER", "(define!=null) && (define.size()>1)");
}
......
if ( !(( foundArgs.size()==define.size()-1 )) ) {
throw new FailedPredicateException(input, "IDENTIFIER", " foundArgs.size()==define.size()-1 ");
}
match(')');
}
break;
case 2 :
// D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:190:7: {...}? =>
{
if ( !(( !((define!=null) && (define.size()>1)) )) ) {
throw new FailedPredicateException(input, "IDENTIFIER", " !((define!=null) && (define.size()>1)) ");
}
}
break;
}很明显,还是会抛异常的。
截止目前,感觉貌似对于上述的semantic predicate,貌似只是antlr v2有效?
而对于antlr v3是含义变了, 变成了预测了 -> 不符合条件,就抛异常 ???
7.后来参考:
http://www.egtry.com/tools/antlr/gated_semantic_predicate
其例子:
Example 2give a sequence of digits, the first digit states how many digits to take next. antlr grammar@init {
int len=0;
int count=0;
}
:
d1=DIGIT {len=Integer.parseInt($d1.text); System.out.println("size of the following digits: "+len);}
( { count< len }?=> d2=DIGIT {count++;System.out.println("element: "+$d2.text);} )+
(d3=DIGIT {System.out.println("Remaining Digit: "+$d3.text);})*
'\r'? '\n'
;
DIGIT: '0' .. '9';input example3123888 Outputsize of the following digits: 3 element: 1 element: 2 element: 3 Remaining Digit: 8 Remaining Digit: 8 Remaining Digit: 8 |
很明显,就是我们所希望的效果:
可以条件性的判断,然后执行不同的语句,即不会当条件不符合,就乱抛异常的。
所以,既然人家的可以正常执行,那么就先去测试该语法,生成的代码是否是预期的,不带乱跑异常的。
测试代码为:
grammar gatedSynmaticPredicateDemo;
options{
language=Java;
output = AST;
}
parseInput
@init {
int len=0;
int count=0;
}
:
d1=DIGIT {len=Integer.parseInt($d1.text); System.out.println("size of the following digits: "+len);}
( { count< len }?=> d2=DIGIT {count++;System.out.println("element: "+$d2.text);} )+
(d3=DIGIT {System.out.println("Remaining Digit: "+$d3.text);})*
'\r'? '\n'
;
DIGIT: '0' .. '9';然后是找到生成的代码了:
while (true) {
int alt1=2;
int LA1_0 = input.LA(1);
if ( (LA1_0==DIGIT) ) {
int LA1_1 = input.LA(2);
if ( (( count< len )) ) {
alt1=1;
}
}
switch (alt1) {
case 1 :
// D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\gatedSynmaticPredicateDemo\\gatedSynmaticPredicateDemo.g:15:5: {...}? =>d2= DIGIT
{
if ( !(( count< len )) ) {
throw new FailedPredicateException(input, "parseInput", " count< len ");
}
d2=(Token)match(input,DIGIT,FOLLOW_DIGIT_in_parseInput53);
d2_tree = (Object)adaptor.create(d2);
adaptor.addChild(root_0, d2_tree);
count++;System.out.println("element: "+(d2!=null?d2.getText():null));
}
break;但是是在gatedSynmaticPredicateDemoParser.java中,而不是Lexer.java中找到的。
并且测试结果是正常的:
但是很明显,此处的gated Synmatic Predicate,是写在parse中的,而不是lexer中的。
8.再参考:
[antlr-interest] Semantic Predicates in a Lexer
好像,应该在parser中使用gated Synmatic Predicate。
9.但是,此处,真正去运行上述的语法:
( {(define!=null) && (define.size()>1)}?=> (WS|COMMENT)?
// take in arguments if macro call requires them
'('
callArg0=EXPR
{
callArg0Text = callArg0.getText();
foundArgs.add(callArg0Text);
}
( COMMA callArg1=EXPR
{
callArg1Text = callArg1.getText();
foundArgs.add(callArg1Text);
}
)*
{ foundArgs.size()==define.size()-1 }? // better have right amount
')'
| {!((define!=null) && (define.size()>1))}?=>
)所产生的代码:
// D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:174:5: ({...}? => ( WS | COMMENT )? '(' callArg0= EXPR ( COMMA callArg1= EXPR )* {...}? ')' |{...}? =>)
int alt18=2;
int LA18_0 = input.LA(1);
if ( ((LA18_0 >= '\t' && LA18_0 <= '\n')||LA18_0=='\r'||LA18_0==' '||LA18_0=='('||LA18_0=='/') && (((define!=null) && (define.size()>1)))) {
alt18=1;
}
switch (alt18) {
case 1 :
// D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:174:7: {...}? => ( WS | COMMENT )? '(' callArg0= EXPR ( COMMA callArg1= EXPR )* {...}? ')'
{
if ( !(((define!=null) && (define.size()>1))) ) {
throw new FailedPredicateException(input, "IDENTIFIER", "(define!=null) && (define.size()>1)");
}
......
if ( !(( foundArgs.size()==define.size()-1 )) ) {
throw new FailedPredicateException(input, "IDENTIFIER", " foundArgs.size()==define.size()-1 ");
}
match(')');
}
break;
case 2 :
// D:\\DevRoot\\IndustrialMobileAutomation\\HandheldDataSetter\\ANTLR\\projects\\v1.5\\HartEddlParser_local_TFS\\preprocess\\remove_comment\\preprocess.g:190:7: {...}? =>
{
if ( !((!((define!=null) && (define.size()>1)))) ) {
throw new FailedPredicateException(input, "IDENTIFIER", "!((define!=null) && (define.size()>1))");
}
}
break;
}
if (define!=null) {
String defineText = (String)define.get(0);
if (define.size()==1) {
//only have one value in list -> the defineText is the define para content -> just need replace directly
setText(defineText);
} else {
//add new dict pair: (para, call value)
for (int i=0;i<foundArgs.size();++i) {
// treat macro arguments similar to local defines
List arg = new ArrayList();
arg.add((String)foundArgs.get(i));
defineArgs.put( (String)define.get(1+i), arg );
}
// save current lexer's state
SaveStruct ss = new SaveStruct(input);
includes.push(ss);
// switch on new input stream
setCharStream(new ANTLRStringStream(defineText));
reset();
}
}结果是,我打了几处的断点:
真的是没有执行到,即没有抛异常了。
然后执行到了,真正要执行的代码的部分:
【总结】
antlr v2的lexer中通过
| {testExpression}? |
(好像叫做validating semantic predicate)
的方式去实现选择性匹配的代码:
( { (define!=null) && (define.size()>1) }? (WS|COMMENT)?
// take in arguments if macro call requires them
'('
callArg0=EXPR
{
callArg0Text = callArg0.getText();
foundArgs.add(callArg0Text);
}
( COMMA callArg1=EXPR
{
callArg1Text = callArg1.getText();
foundArgs.add(callArg1Text);
}
)*
{ foundArgs.size()==define.size()-1 }? // better have right amount
')'
| { !((define!=null) && (define.size()>1)) }?
)在antlr v3的lexer中,需要改为:
| {testExpression}?=> |
(好像叫做gated Synmatic Predicate)
的形式:
( {(define!=null) && (define.size()>1)}?=> (WS|COMMENT)?
// take in arguments if macro call requires them
'('
callArg0=EXPR
{
callArg0Text = callArg0.getText();
foundArgs.add(callArg0Text);
}
( COMMA callArg1=EXPR
{
callArg1Text = callArg1.getText();
foundArgs.add(callArg1Text);
}
)*
{ foundArgs.size()==define.size()-1 }? // better have right amount
')'
| {!((define!=null) && (define.size()>1))}?=>
)然后才可以真正实现,选择性的匹配对应的内容。
转载请注明:在路上 » 【已解决】antlr v3的lexer的条件性匹配