Hi there, I've open-sourced my new library, RegexGen.js, a JavaScript regular expression generator, please give it a try. Comments and issue reports are welcome. Thank you!
RegexGen.js - JavaScript Regular Expression Generator
RegexGen.js is a JavaScript regular expression generator that helps to construct complex regular expressions, inspired by JSVerbalExpressions.
RegexGen.js is basically designed for people who know how the regular expression engine works, but not working with it regularly, i.e., they know how to make the regex works but may not remember every meta-characters that constructs the regex.
RegexGen.js helps people don't have to remember: meta-characters, shortcuts, what characters to escape and tricks about corner cases (http://stackoverflow.com/questions/5484084/what-literal-characters-should-be-escaped-in-a-regex/5484178#5484178).
RegexGen.js helps reusing regex patterns. (checkout the [Matching an IP Address] example bellow.)
The Problems
RegexGen.js tries to ease two problems.
- While creating a regular expression, it's hard to remember the correct syntax and what characters to escape.
- After done creating a regular expression, it's hard to read and remember what the regex do.
The Goals
RegexGen.js is designed to achieve the following goals.
- The written codes should be easy to read and easy to understand.
- The generated code should be as compact as possible, e.g., no redundant brackets and parentheses.
- No more character escaping reguired (except '\', or if you use regex overwrite.)
- If the generated code is not good enougth, bad parts can be easily replaced directly in the written codes.
Getting Started
The generator is exported as a regexGen()
function.
To generate a regular expression, pass sub-expressions as parameters to the call of regexGen()
function.
Sub-expressions which are separated by comma are concatenated together to form the whole regular expression.
Sub-expressions can either be a string
, a number
, a RegExp
object, or any combinations of the call to methods (i.e., the sub-generators
) of the regexGen()
function object, as the following informal BNF syntax.
Strings passed to the the call of regexGen()
, text()
, maybe()
, anyCharOf()
and anyCharBut()
methods, are always escaped as necessary, so you don't have to worry about which characters to escape.
The result of calling the regexGen()
function is a RegExp
object.
1 | var regexGen = require('regexgen.js'); |
The basic usage can be expressed as the following informal BNF syntax.
1 | regex ::= regexGen( sub-expression [, sub-expression ...] [, modifier ...] ) |
Please check out regexgen.js and wiki for API documentations, and check out test.js for more examples.
Installation
npm install regexgen.js
Usage
Since the generator is exported as the regexGen()
function, everything must be referenced from it.
To simplify codes, assign it to a short variable is preferable.
1 | var = require('regexgen.js'); |
Note: Though not recommended, if you still feel inconvenient, and don't mind the global object being polluted,
use the regexGen.mixin()
function to export all member functions of the regexGen()
function object to the global object.
1 | var regexGen = require('regexgen.js'); |
About The Returned RegExp Object
The RegExp
object returned from the call of regexGen()
function, can be used directly as usual.
In addition, there are four properties injected to the RegExp
object:warnings
array, captures
array, extract()
method and replace()
method.
Checkout wiki for details.
Examples
Simple Password Validation
This example is taken from the article: Mastering Lookahead and Lookbehind.
1 | var = require('regexgen.js'); |
Generates:
1 | /^(?=.?[a-z])(?=.?[A-Z])(?=.*?\d)\w{6,10}$/ |
Matching an IP Address
This example is taken from the book: Mastering Regular Expressions
1 | var = require('regexgen.js'); |
Generates:
1 | /^([01]?\d\d?|2[0-4]\d|25[0-5]).([01]?\d\d?|2[0-4]\d|25[0-5]).([01]?\d\d?|2[0-4]\d|25[0-5]).([01]?\d\d?|2[0-4]\d|25[0-5])$/ |
Matching Balanced Sets of Parentheses
This example is taken from the book: Mastering Regular Expressions
1 | var = require('regexgen.js'); |
Generates:
1 | /([^()](?:([^()])[^()]))/ |
Matching Balanced Sets of Parentheses within Any Given Levels of Depth
This example is taken from the book: Mastering Regular Expressions
1 | var = require('regexgen.js'); |
Given 1 level of nesting:
1 | var regex = _( |
Generates:
1 | /((?:[^()]|([^()])))/ |
Given 3 levels of nesting:
1 | var regex = _( |
Generates:
1 | /((?:[^()]|((?:[^()]|((?:[^()]|([^()])))))))/ |
Matching an HTML Tag
This example is taken from the book: Mastering Regular Expressions
1 | var = require('regexgen.js'); |
Generates:
1 | /<(?:"[^"]"|'[^']'|[^"'>])*>/ |
Matching an HTML Link
This example is taken from the book: Mastering Regular Expressions
1 | var = require('regexgen.js'); |
Generates:
1 | /<a\b([^>]+)>(.?)<\/a>/gi |
Here's how to iterate all links (in browser):
1 | var capture, guts, link, url, html = document.documentElement.outerHTML; |
Examining an HTTP URL
This example is taken from the book: Mastering Regular Expressions
1 | var = require('regexgen.js'); |
Generates:
1 | /^https?:\/\/([^/:]+)(?::(\d+))?(\/.*)?$/ |
Here's a snippet to report about a URL (in browser):
1 | var capture = location.href.match( regex ); |
Validating a Hostname
This example is taken from the book: Mastering Regular Expressions
1 | var = require('regexgen.js'); |
Generates:
1 | /^(?:[a-z0-9].|[a-z0-9][-a-z0-9]{0,61}[a-z0-9].)*(?:com|edu|gov|int|mil|net|org|biz|info|name|museum|coop|aero|[a-z][a-z])$/ |
Parsing CSV Files
This example is taken from the book: Mastering Regular Expressions
1 | var = require('regexgen.js'); |
Generates:
1 | /(?:^|,)(?:"([^"](?:""[^"]))"|([^",]))/ |
Change logs
2015-09-12:
- Remove UMD headers that support RequireJS and browser globals.
- The json object returned from
extract()
method without the "0" property. - Add
replace()
method to RegExp object.
2014-09-20:
- Fix CommonJS factory invoking bug.
2014-08-17:
- Rename RegExp.jsonExec() to extract().
2014-08-15:
- Character Classes now support nesting.
- Fix a bug in multiple(). In the case of multiple(5) that returns /{,}/, and should be /{5,}/.
2014-08-10:
- Added RegExp.jsonExec() method, that returns a JSON object using capture names as properties.
Author
References
- RegexGen.js - JavaScript Regular Expression Generator
- Mastering Regular Expressions, 3rd Edition
- Mastering Lookahead and Lookbehind