【问题】
折腾:
【基本解决】antlr v3中包含{skip();}的语法,调试解析时出错:org.antlr.runtime.EarlyExitException
的过程中,把语法改为:
BLANKS : (' '|'\t')+ {$channel=HIDDEN;};
startParse : manufacture deviceType deviceRevison ddRevision;
manufacture : 'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
结果,虽然是可以正常识别数值了,但是却又出现了MissingTokenException:
【解决过程】
1.此处,很明显,还是没有完全搞懂:
{skip();}和
{$channel=HIDDEN;}的语法的含义。
2.参考:
cannot debug simple channel flag in ANTLR with Eclipse
没啥帮助,其是把
{$channel = HIDDEN;}误写成:
($channel = HIDDEN;)
了。我此处不存在这等语法问题。
3。参考:
看起来像是,如果本身语法写的不好,变成:
不是context-free
那么就会导致此类问题。
所以,再回去,检查一下语法,看看自己能否看出一些端倪。
4.改为:
//BLANKS : (' '|'\t')+ {$channel=HIDDEN;};
BLANKS : ' '+ {$channel=HIDDEN;};
startParse : manufacture deviceType deviceRevison ddRevision;
manufacture : 'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;试试,结果错误依旧,还是MissingTokenException。
5.怀疑,现在是
BLANK+
和
BLANKS,两者冲突了,所以,去把现在的:
BLANK : (' '|'\t') {$channel=HIDDEN;};
//BLANKS : (' '|'\t')+ {$channel=HIDDEN;};
BLANKS : ' '+ {$channel=HIDDEN;};
//startParse : include* identification+;
//startParse : include+ identification+;
//startParse : identification+;
startParse : manufacture deviceType deviceRevison ddRevision;
manufacture : 'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
//manufacture : 'MANUFACTURER'^ (BLANK+! (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
deviceType : 'DEVICE_TYPE'^ BLANK+ (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
deviceRevison : 'DEVICE_REVISION'^ BLANK+ (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;
ddRevision : 'DD_REVISION'^ BLANK+ (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;改为:
//BLANK : (' '|'\t') {$channel=HIDDEN;};
//BLANKS : (' '|'\t')+ {$channel=HIDDEN;};
BLANKS : ' '+ {$channel=HIDDEN;};
//startParse : include* identification+;
//startParse : include+ identification+;
//startParse : identification+;
startParse : manufacture deviceType deviceRevison ddRevision;
manufacture : 'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
//manufacture : 'MANUFACTURER'^ (BLANK+! (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
deviceType : 'DEVICE_TYPE'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
deviceRevison : 'DEVICE_REVISION'^ BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;
ddRevision : 'DD_REVISION'^ BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;试试,结果错误依旧。
6.后来,倒是,改为skip的形式:
//BLANKS : (' '|'\t')+ {$channel=HIDDEN;};
BLANKS : (' '|'\t')+ {skip();};
//BLANKS : ' '+ {$channel=HIDDEN;};
//startParse : include* identification+;
//startParse : include+ identification+;
//startParse : identification+;
startParse : manufacture deviceType deviceRevison ddRevision;
manufacture : 'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
//manufacture : 'MANUFACTURER'^ (BLANK+! (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
deviceType : 'DEVICE_TYPE'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
deviceRevison : 'DEVICE_REVISION'^ BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;
ddRevision : 'DD_REVISION'^ BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;结果也是,错误依旧。
7.把中间的空格去掉,变成:
//BLANKS : (' '|'\t')+ {$channel=HIDDEN;};
BLANKS : (' '|'\t')+ {skip();};
//BLANKS : ' '+ {$channel=HIDDEN;};
//startParse : include* identification+;
//startParse : include+ identification+;
//startParse : identification+;
startParse : manufacture deviceType deviceRevison ddRevision;
manufacture : 'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
//manufacture : 'MANUFACTURER'^ (BLANK+! (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
deviceType : 'DEVICE_TYPE'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
deviceRevison : 'DEVICE_REVISION'^ BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;
ddRevision : 'DD_REVISION'^ BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;试试,结果错误依旧。
说明不是写语法时候的多余的空格或tab引起的。
8.难道是,前面的语法中的
DIGIT和HEX_DIGIT有冲突?
对应的定义是:
fragment
DIGIT
: '0'..'9';
//FAKE_TOKEN : '1' '2' '3';
/*
DECIMAL_VALUE
: '1'..'9' DIGIT*;
*/
DECIMAL_VALUE
: DIGIT*;
HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;那么就去掉重复的定义,改为:
fragment
DIGIT
: '0'..'9';
//FAKE_TOKEN : '1' '2' '3';
/*
DECIMAL_VALUE
: '1'..'9' DIGIT*;
*/
DECIMAL_VALUE
: DIGIT*;
//HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;
HEX_DIGIT : (DIGIT|'a'..'f'|'A'..'F') ;
HEX_VALUE
: '0x' HEX_DIGIT+;试试,结果错误依旧。
9.参考了:
[antlr-interest] C Runtime problem with $channel=HIDDEN and SKIP()
难道是,此处的Java版本的,
{$channel=HIDDEN;}
也是有bug,所以才导致MissingTokenException的?
10.后来找到此MissingTokenException错误,是3.1版本中新加的:
为了更好的提供错误的详细信息的。
11.再去改为:
//BLANKS : (' '|'\t')+ {$channel=HIDDEN;};
BLANKS : ((' '|'\t')+) {$channel=HIDDEN;};试试,结果错误依旧。
12.后来仔细去查看了一下,关于MissingTokenException的错误的产生的过程:
觉得,好像MissingTokenException的产生,是多次检索此处的值0x1E6D11之后,而产生的。
换句话说,好像此处的MissingTokenException,和前面的
BLANKS : (' '|'\t')+ {$channel=HIDDEN;};没啥关系,而是和后面的语法有关系。
所以,就去研究看看后面的语法:
(DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;是不是哪里写的不妥。
13.先把感叹号去掉:
//manufacture : 'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
//manufacture : 'MANUFACTURER'^ (BLANK+! (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
manufacture : 'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) ','? WS*;试试,结果错误依旧。
14.怀疑是不是DECIMAL_VALUE或者HEX_VALUE写的有问题。
所以去改为:
//DECIMAL_VALUE : DIGIT*; DECIMAL_VALUE : DIGIT+;
试试,结果错误依旧。
15.去把HEX_VALUE和DECIMAL_VALUE顺序换一个:
//manufacture : 'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) ','? WS*; manufacture : 'MANUFACTURER'^ BLANKS (HEX_VALUE | DECIMAL_VALUE) ','? WS*;
试试,结果错误依旧。
16.把WS的skip换为hidden:
//fragment WS : ( ' ' | '\t' | '\r' | '\n') {skip();};
fragment WS : ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};试试,结果直接出错:
[14:05:28] D:\DevRoot\IndustrialMobileAutomation\HandheldDataSetter\ANTLR\projects\v1.5\DDParserDemo\output\DDParserDemoLexer.java:593: error: cannot find symbol [14:05:28] ^ [14:05:28] symbol: variable _channel [14:05:28] location: class DDParserDemoLexer [14:05:28] 1 error |
17.所以再把fragment去掉:
//fragment WS : ( ' ' | '\t' | '\r' | '\n') {skip();};
//fragment WS : ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
WS : ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};试试,结果错误依旧,还是MissingTokenException。
18.再去仔细研究后发现, 好像还是,在识别数字0x1E6D11之前,发生的MissingTokenException,所以,还是要去折腾BLANKS。
改为:
//BLANKS : (' '|'\t')+ {$channel=HIDDEN;};
BLANKS : (' '|'\t')+;试试,结果,最终,才算是,正常识别空格:
但是很是诡异的是,为何,此处无法给多个空格,添加对应的skip()或hidden呢?
19.所以,再去把BLANKS改为BLANK,同时添加hidden:
BLANK : (' '|'\t') {$channel=HIDDEN;};
//BLANKS : (' '|'\t')+ {skip();};
//BLANKS : ' '+ {$channel=HIDDEN;};
//startParse : include* identification+;
//startParse : include+ identification+;
//startParse : identification+;
startParse : manufacture deviceType deviceRevison ddRevision;
//manufacture : 'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
//manufacture : 'MANUFACTURER'^ (BLANK+! (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
//manufacture : 'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) ','? WS*;
manufacture : 'MANUFACTURER'^ BLANK+ (HEX_VALUE | DECIMAL_VALUE) ','? WS*;
deviceType : 'DEVICE_TYPE'^ BLANK+ (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
deviceRevison : 'DEVICE_REVISION'^ BLANK+ (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;
ddRevision : 'DD_REVISION'^ BLANK+ (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;试试,结果又回到了开始的那个org.antlr.runtime.EarlyExitException的错误了:
所以,没法这么用。
20.然后再去试试skip:
//BLANK : (' '|'\t') {$channel=HIDDEN;};
BLANK : (' '|'\t') {skip();};结果报错:
[14:27:15] error(208): DDParserDemo.g:119:1: The following token definitions can never be matched because prior tokens match the same input: BLANK
去看了下,应该是,已经有的WS,同样匹配此等输入了,所以,去改为:
/*
BLANKSPACE_TAB
// : (' ' | '\t'){skip();};
: (' ' | '\t')
{$channel=HIDDEN;};
*/
//fragment BLANK : (' '|'\t')+ {skip();};
//BLANK : (' '|'\t') {skip();};
//BLANK : (' '|'\t');
//BLANK : (' '|'\t') {$channel=HIDDEN;};
//BLANKS : (' '|'\t')+ {$channel=HIDDEN;};
//BLANKS : (' '|'\t')+ {$channel=HIDDEN;};
//BLANKS : (' '|'\t')+;
//BLANK : (' '|'\t') {$channel=HIDDEN;};
//BLANK : (' '|'\t') {skip();};
//BLANKS : (' '|'\t')+ {skip();};
//BLANKS : ' '+ {$channel=HIDDEN;};
//startParse : include* identification+;
//startParse : include+ identification+;
//startParse : identification+;
startParse : manufacture deviceType deviceRevison ddRevision;
//manufacture : 'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
//manufacture : 'MANUFACTURER'^ (BLANK+! (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
//manufacture : 'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) ','? WS*;
manufacture : 'MANUFACTURER'^ WS+ (HEX_VALUE | DECIMAL_VALUE) ','? WS*;
deviceType : 'DEVICE_TYPE'^ WS+ (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
deviceRevison : 'DEVICE_REVISION'^ WS+ (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;
ddRevision : 'DD_REVISION'^ WS+ (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;试试,结果仍是EarlyExitException的问题。
所以,貌似还是不能在此处使用skip或者hidden。
21.最后,还是通过:
grammar DDParserDemo;
options {
output = AST;
ASTLabelType = CommonTree; // type of $stat.tree ref etc...
}
//NEWLINE : '\r'? '\n' ;
//NEWLINE : '\r' '\n' ;
fragment
NEWLINE : '\r'? '\n' ;
fragment
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
fragment
FLOAT
: ('0'..'9')+ '.' ('0'..'9')* EXPONENT?
| '.' ('0'..'9')+ EXPONENT?
| ('0'..'9')+ EXPONENT
;
COMMENT
: '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
| '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
;
//fragment WS : ( ' ' | '\t' | '\r' | '\n') {skip();};
//fragment WS : ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
WS : ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
STRING
: '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
;
CHAR: '\'' ( ESC_SEQ | ~('\''|'\\') ) '\''
;
fragment
EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
ESC_SEQ
: '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
| UNICODE_ESC
| OCTAL_ESC
;
fragment
OCTAL_ESC
: '\\' ('0'..'3') ('0'..'7') ('0'..'7')
| '\\' ('0'..'7') ('0'..'7')
| '\\' ('0'..'7')
;
fragment
UNICODE_ESC
: '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
;
fragment
DIGIT
: '0'..'9';
//FAKE_TOKEN : '1' '2' '3';
/*
DECIMAL_VALUE
: '1'..'9' DIGIT*;
*/
//DECIMAL_VALUE : DIGIT*;
DECIMAL_VALUE : DIGIT+;
//HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;
HEX_DIGIT : (DIGIT|'a'..'f'|'A'..'F') ;
HEX_VALUE
: '0x' HEX_DIGIT+;
fragment
HEADER_FILENAME
: ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'_')*;
/*
//singleInclude : '#include' ' '+ '"' ID '.h"' ;
//singleInclude : '#include' ' '+ '"' ID+ '.h"' ;
//singleInclude : '#include' ' '+ '"' HEADER_FILENAME '.h"';
//singleInclude : '#include' ' ' '"' HEADER_FILENAME '.h"';
//singleInclude : '#include "' HEADER_FILENAME '.h"';
//fragment singleInclude : '#include' (' ')+ '"' ID '.h"';
//singleInclude : '#include' (' '|'\t')+ '""' ID '.h"';
//singleInclude : '#include' (' '|'\t')+ '"std_defs.h"';
singleInclude : '#include' (' '|'\t')+ ID '.h';
include : singleInclude WS* -> singleInclude;
*/
/*
BLANKSPACE_TAB
// : (' ' | '\t'){skip();};
: (' ' | '\t')
{$channel=HIDDEN;};
*/
//fragment BLANK : (' '|'\t')+ {skip();};
//BLANK : (' '|'\t') {skip();};
//BLANK : (' '|'\t');
//BLANK : (' '|'\t') {$channel=HIDDEN;};
//BLANKS : (' '|'\t')+ {$channel=HIDDEN;};
//BLANKS : (' '|'\t')+ {$channel=HIDDEN;};
//BLANKS : (' '|'\t')+;
//BLANK : (' '|'\t') {$channel=HIDDEN;};
//BLANK : (' '|'\t') {skip();};
BLANKS : (' '|'\t')+;
//BLANKS : (' '|'\t')+ {skip();};
//BLANKS : ' '+ {$channel=HIDDEN;};
//startParse : include* identification+;
//startParse : include+ identification+;
//startParse : identification+;
startParse : manufacture deviceType deviceRevison ddRevision;
//manufacture : 'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
//manufacture : 'MANUFACTURER'^ (BLANK+! (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
//manufacture : 'MANUFACTURER'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) ','? WS*;
manufacture : 'MANUFACTURER'^ BLANKS (HEX_VALUE | DECIMAL_VALUE) ','? WS*;
deviceType : 'DEVICE_TYPE'^ BLANKS (DECIMAL_VALUE | HEX_VALUE) (','?)! WS*;
deviceRevison : 'DEVICE_REVISION'^ BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;
ddRevision : 'DD_REVISION'^ BLANKS (DECIMAL_VALUE | HEX_VALUE)(','?)! WS*;
//identification : definiton WS* (','?)! WS* -> definiton;
//definiton : (ID)^ ('\t'!|' '!)+ (DECIMAL_VALUE | HEX_VALUE)
//definiton : (ID)^ BLANKSPACE_TAB+ (DECIMAL_VALUE | HEX_VALUE)
//definiton : ID ('\t'!|' '!)+ (DECIMAL_VALUE | HEX_VALUE);去匹配:
MANUFACTURER 0x1E6D11, DEVICE_TYPE 0x00FF, DEVICE_REVISION 5, DD_REVISION 1
然后得到如下树结构:
【总结】
1.对于匹配空格或Tab,无法使用skip()或者$channel=HIDDEN,否则,会导致无法正常解析。
2.不能在已经定义好了WS的情况下,再次单独定义单个的BLANK为空格或Tab,否则会导致重复定义,会报错:
| The following token definitions can never be matched because prior tokens match the same input: BLANK |
3.最终只能使用,单独定义BLANKS:
BLANKS : (' '|'\t')+;然后在后面使用:
manufacture : 'MANUFACTURER'^ BLANKS (HEX_VALUE | DECIMAL_VALUE) ','? WS*;
如此:
- 才能正常识别输入的内容,包括空格;
- 但是识别出来的空格,就没法实现hidden或skip的效果了。目前貌似没法实现此效果。
【后记】
1.后来,看到这个:
what is wrong with this grammar
感觉那人说的有理,我感觉可能也是:
此MissingTokenException,可能是antlr(或antlrworks)的bug。
毕竟,语法上,貌似都没有问题,并且也都可以正常执行代码,不应该报此错误才对。
当然,有待更清楚人的来确认一下。是不是bug。
转载请注明:在路上 » 【基本解决】antlr v3,用包含{$channel=HIDDEN;}语法,结果解析出错:MissingTokenException