Defining source editors with a DSL


As of today it is quite cumbersome to define source editors who are built on top of the Eclipse Text Framework.

In the upcoming 2.1 release of e(fx)clipse we’ll ship a first set of components making it dead simple to define editors for e4 on JavaFX applications (adding support for Eclipse 4.x and e4 on SWT would be possible in future as well).

One of the central components of the new code editing support is a DSL who allows you define all relevant parts of your code editor (as of 2.1 it only deals with lexical syntax highlighting).

This blog explains the definition of a source editor for Google Dart. Follow up blog posts will make use of the definition created in this post to build a complete editor like this:

editor

Overview

Let’s start from the bottom up:

The following file defines the complete setup required for a source code editor with syntax highlighting stored in a file named “dart.ldef”

package org.eclipse.fx.code.dart.text

dart {
	partitioning {
		partition __dftl_partition_content_type
		partition __dart_singlelinedoc_comment
		partition __dart_multilinedoc_comment
		partition __dart_singleline_comment
		partition __dart_multiline_comment
		partition __dart_string
		rule {
			single_line __dart_string "'" => "'"
			single_line __dart_string '"' => '"'
			single_line __dart_singlelinedoc_comment '///' => ''
      			single_line __dart_singleline_comment '//' => ''
      			multi_line __dart_multilinedoc_comment '/**' => '*/'
      			multi_line  __dart_multiline_comment '/*' => '*/'
		}
	}
	lexical_highlighting {
		rule __dftl_partition_content_type whitespace javawhitespace {
			default dart_default
			dart_operator {
				character [ ';', '.', '=', '/', '\\', '+', '-', '*', '<', '>', ':', '?', '!', ',', '|', '&', '^', '%', '~' ]
			}
			dart_bracket {
				character [ '(', ')', '{', '}', '[', ']' ]
			}
			dart_keyword {
				keywords [ 	  "break", "case", "catch", "class", "const", "continue", "default"
							, "do", "else", "enum", "extends", "false", "final", "finally", "for"
							,  "if", "in", "is", "new", "null", "rethrow", "return", "super"
							, "switch", "this", "throw", "true", "try", "var", "void", "while"
							, "with"  ]
			}
			dart_keyword_1 {
				keywords [ 	  "abstract", "as", "assert", "deferred"
							, "dynamic", "export", "external", "factory", "get"
							, "implements", "import", "library", "operator", "part", "set", "static"
							, "typedef" ]
			}
			dart_keyword_2 {
				keywords [ "async", "async*", "await", "sync*", "yield", "yield*" ]
			}
			dart_builtin_types {
				keywords [ "num", "String", "bool", "int", "double", "List", "Map" ]
			}
		}
		rule __dart_singlelinedoc_comment {
			default dart_doc
			dart_doc_reference {
				single_line "[" => "]"
			}
		}
		rule __dart_multilinedoc_comment {
			default dart_doc
			dart_doc_reference {
				single_line "[" => "]"
			}
		}
		rule __dart_singleline_comment {
			default dart_single_line_comment
		}
		rule __dart_multiline_comment {
			default dart_multi_line_comment
		}
		rule __dart_string {
			default dart_string
			dart_string_inter {
				single_line "${" => "}"
				//TODO We need a $ => IDENTIFIER_CHAR rule
			}
		}
	}
}

As you note the file is split in 2 big sections.

package org.eclipse.fx.code.dart.text

dart {
	partitioning {
		
	}
	lexical_highlighting {

	}
}
  • partitioning: this section defines how different partitions in your file can be identified. Most likely you have partitions for the code, comments, … but eventually it is up to you
  • lexical_highlighting: this sections defines how a partition is tokenized to eg highlight keywords, operators, …

Partitioning

partitioning {
	partition __dftl_partition_content_type
	partition __dart_singlelinedoc_comment
	partition __dart_multilinedoc_comment
	partition __dart_singleline_comment
	partition __dart_multiline_comment
	partition __dart_string
	rule {
		single_line __dart_string "'" => "'"
		single_line __dart_string '"' => '"'
		single_line __dart_singlelinedoc_comment '///' => ''
      		single_line __dart_singleline_comment '//' => ''
      		multi_line __dart_multilinedoc_comment '/**' => '*/'
      		multi_line  __dart_multiline_comment '/*' => '*/'
	}
}

The partitioning section starts with a list of all available partitions. At least the __dftl_partition_content_type is required but for dart we define:

  • __dftl_partition_content_type: This probably the most important section because this is where all your code is going to
    class Rectangle {
      num left;   
      num top; 
      num width; 
      num height; 
      
      num get right             => left + width;
          set right(num value)  => left = value - width;
      num get bottom            => top + height;
          set bottom(num value) => top = value - height;
    }
    
  • __dart_singlelinedoc_comment: A single line documentary comment
    /// This is single line doc [aNumber]
    printNumber(num aNumber) {}
    
  • __dart_multilinedoc_comment: A multi line documentary comment
    /**
     * A multi line document comment [aNumber]
     */
    printNumber(num aNumber) {}
    
  • __dart_singleline_comment: A single line comment
    var a = 12; // Single line comment
    
  • __dart_multiline_comment: A multi line comment
    /*
     * A multi line comment
     */
    var a = 12;
    
  • __dart_string:
    var a = 0;
    var s1 = 'A string ${a} with interpolation';
    var s2 = "A string ${a} with interpolation";
    

Once the list of partitions is defined we need to define the rules used to identify the partitions. As of now we support 2 rule types:

  • Single line rule: rule starts if start sequence is detected and ends with a line break or if the end sequence is found
    single_line __dart_singleline_comment '//' => ''
    
  • Multi line rule: rule starts if start sequence is detected and rule ends with the end of file or if the end sequence is found
    multi_line  __dart_multiline_comment '/*' => '*/'
    

Lexical highlighting

lexical_highlighting {
	rule __dftl_partition_content_type whitespace javawhitespace {
		default dart_default
		dart_operator {
			character [ ';', '.', '=', '/', '\\', '+', '-', '*', '<', '>', ':', '?', '!', ',', '|', '&', '^', '%', '~' ]
		}
		dart_bracket {
			character [ '(', ')', '{', '}', '[', ']' ]
		}
		dart_keyword {
			keywords [ 
				"break", "case", "catch", "class", "const", "continue", "default"
				, "do", "else", "enum", "extends", "false", "final", "finally", "for"
				,  "if", "in", "is", "new", "null", "rethrow", "return", "super"
				, "switch", "this", "throw", "true", "try", "var", "void", "while"
				, "with"  ]
		}
		dart_keyword_1 {
			keywords [ 	  
				"abstract", "as", "assert", "deferred"
				, "dynamic", "export", "external", "factory", "get"
				, "implements", "import", "library", "operator", "part", "set", "static"
				, "typedef" ]
			}
		dart_keyword_2 {
			keywords [ "async", "async*", "await", "sync*", "yield", "yield*" ]
		}
		dart_builtin_types {
			keywords [ "num", "String", "bool", "int", "double", "List", "Map" ]
		}
	}
	rule __dart_singlelinedoc_comment {
		default dart_doc
		dart_doc_reference {
			single_line "[" => "]"
		}
	}
	rule __dart_multilinedoc_comment {
		default dart_doc
		dart_doc_reference {
			single_line "[" => "]"
		}
	}
	rule __dart_singleline_comment {
		default dart_single_line_comment
	}
	rule __dart_multiline_comment {
		default dart_multi_line_comment
	}
	rule __dart_string {
		default dart_string
		dart_string_inter {
			single_line "${" => "}"
			//TODO We need a $ => IDENTIFIER_CHAR rule
		}
	}
}

As you note for each partition from above we define how we split this is into tokens (eg dart_default, dart_keyword, …) so that we can color them differently eg keywords, … .

We currently support the following tokenizer rules

  • character: This rule allows to define single value tokens like operators, block definitions, …
    dart_bracket {
    	character [ '(', ')', '{', '}', '[', ']' ]
    }
    
  • keywords: This rule allows to define a list of keywords
    dart_keyword {
    	keywords [ "break", "case", "catch", "class", "const", "continue", "default", ... ]
    }
    
  • single_line: This rule starts with the start sequence and ends with a new line or if the end sequence is matched
    dart_doc_reference {
    	single_line "[" => "]"
    }
    
  • multi_line: This rule starts with the start sequence and ends with an EOF or if the end sequence is matched. We don’t require this rule for dart

From the token to the colored string

Now that we have defined how we want our sourcecode to be tokenized we need the final step. Defining what a token “dart_keyword” means in the UI. For JavaFX we simply map the token names into JavaFX CSS class selectors.

.styled-text-area .dart.dart_default {
	-styled-text-color: rgb(0, 0, 0);
}

.styled-text-area .dart.dart_operator {
	-styled-text-color: rgb(0, 0, 0);
}

.styled-text-area .dart.dart_bracket {
	-styled-text-color: rgb(0, 0, 0);
}

.styled-text-area .dart.dart_keyword {
	-styled-text-color: rgb(127, 0, 85);
	-fx-font-weight: bold;
}

.styled-text-area .dart.dart_keyword_1 {
	-styled-text-color: rgb(127, 0, 85);
	-fx-font-weight: bold;
}

.styled-text-area .dart.dart_keyword_2 {
	-styled-text-color: rgb(127, 0, 85);
	-fx-font-weight: bold;
}

.styled-text-area .dart.dart_single_line_comment {
	-styled-text-color: rgb(63, 127, 95);
}

.styled-text-area .dart.dart_multi_line_comment {
	-styled-text-color: rgb(63, 127, 95);
}

.styled-text-area .dart.dart_string {
	-styled-text-color: rgb(42, 0, 255);
}

.styled-text-area .dart.dart_string_inter {
	-styled-text-color: rgb(42, 0, 255);
	-fx-font-weight: bold;
}

.styled-text-area .dart.dart_builtin_types {
	-styled-text-color: #74a567;
	-fx-font-weight: bold;
}

.styled-text-area .dart.dart_doc {
	-styled-text-color: rgb(63, 95, 191);
}

.styled-text-area .dart.dart_doc_reference {
	-styled-text-color: rgb(63, 95, 191);
	-fx-font-weight: bold;
}

The rule to make up the selector is trivial. Let’s for example look at:

.styled-text-area .dart.dart_keyword {
	-styled-text-color: rgb(127, 0, 85);
	-fx-font-weight: bold;
}
...
dart {
	lexical_highlighting {
		rule __dftl_partition_content_type whitespace javawhitespace {
			dart_keyword {
				...
			}
		}
	}
}
...
  • .styled-text-area: Selector to narrow the window where the real token selector is applicable
  • .dart.dart_keyword: Selector made up from the language name (dart) and the token name (dart_keyword)

8 thoughts on “Defining source editors with a DSL

  1. neil matatall July 30, 2015 / 7:18 pm

    This is great. I was able to follow along and create a ruby editor within a matter of minutes! It also runs well within my jrubyfx application, again almost without issue and it’s much better than the previous javascript-in-a-webview code editor that it replaced.

    Where can I find more information about the DSL? I’m having trouble modeling some things elegantly (possibly incorrectly). Will there be a maintained list of ldefs for various languages? I’ll also need ERB, HAML, JSON, YAML, etc. I’d love to build off of what others have and contribute where I can. I’ll publish the ruby ldef / editor project later this week.

    • Tom Schindl July 30, 2015 / 7:49 pm

      Perfect!

      On the DSL:
      In general the DSL is a frontend to Eclipse-Text😉

      There’s not much documentation for the DSL at the moment and we need support for more clever TokenRules (eg. one that allows to use a pattern instead of a character which is required to support string interpolation in script languages like dart, perl, python, …). If you are interested the rules stuff is at http://git.eclipse.org/c/efxclipse/org.eclipse.efxclipse.git/tree/bundles/code/org.eclipse.fx.text.

      On the central ldef store:
      I’ve created a folder in our git repo at Eclipse.org and pushed the dart definition (http://git.eclipse.org/c/efxclipse/org.eclipse.efxclipse.git/tree/bundles/code/ldef-store). I also have already definitions for Java, JavaScript and XML but I need to translated them from a previous version of the DSL. If you file a bug and attach your ldef and css files (or even better if you create a gerrit-review) we can collect them there for now.

      • oreoshake July 30, 2015 / 8:48 pm

        > eg. one that allows to use a pattern instead of a character

        Yep, that’s what I was referring to as something that wasn’t elegant to model. I will gladly contribute the files and am looking forward to the javascript ldef.

    • Tom Schindl July 30, 2015 / 8:07 pm

      Oh and for files who are made up from different languages (eg HTML+JavaScript+CSS, HTML+PHP, …) we need to extends the DSL so that a definition can be made up from different single languages

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s