{"id":285,"date":"2011-07-29T14:23:16","date_gmt":"2011-07-29T14:23:16","guid":{"rendered":"http:\/\/www.extradrm.com\/?p=285"},"modified":"2011-07-29T14:23:16","modified_gmt":"2011-07-29T14:23:16","slug":"regular-expressions-in-marcedit","status":"publish","type":"post","link":"https:\/\/www.extradrm.com\/?p=285","title":{"rendered":"Regular Expressions in MarcEdit"},"content":{"rendered":"<p><strong>Using Regular Expressions in MarcEditor Replacement Functions:<\/strong><\/p>\n<p>Regular Expressions can now be utilized in the Replace and Edit Subfield functions in the MarcEditor.<\/p>\n<p>This allows users to create complex search and replacement functions. In general, MarcEdit&#8217;s Regular Expression implemention is fairly straightforward. First, MarcEdit uses a replace\/with structure, meaning that the regular expression must be broken into a pattern and a replacement argument. Second, MarcEdit&#8217;s implementation is slightly different from the traditional unix greg implementation. For example, if there was a field containing the following data:<\/p>\n<blockquote><p>aaabbb<\/p><\/blockquote>\n<p>And the user wanted the final output to look like:<\/p>\n<blockquote><p>aaabxxbb<\/p><\/blockquote>\n<p>In Unix, one might use the following regular expression:<\/p>\n<blockquote><p>\/ab\/\u0000xx\/<\/p><\/blockquote>\n<p>Using the Replace Function, this same expression would be written like :<\/p>\n<p>In the Find Text Textbox: <strong>ab<\/strong><\/p>\n<p>In the Replace With Textbox: <strong>\u00001xx<\/strong><\/p>\n<p>Check the <strong>Use Regular Expression<\/strong> option<\/p>\n<p>In MarcEdit, regular expressions should use the format defined below:<\/p>\n<p><span style=\"color: #0000ff;\"><strong><span style=\"text-decoration: underline;\">Regular Expression Syntax in MarcEdit:<\/span><\/strong><\/span><\/p>\n<p><strong><span style=\"text-decoration: underline;\">char definition:<\/span><\/strong><\/p>\n<p><strong>. (period)<\/strong> : Matches any character, except the end-of-line.<\/p>\n<p><strong>^ (caret)<\/strong> : Matches the actual beginning-of-line position or the preceding line-delimiter character pair (also see [^] below for usage within a character class definition).<\/p>\n<p><strong>$ (dollar)<\/strong> : Matches the end-of-line position.<\/p>\n<p><strong>| (stile)<\/strong> : Specifies alternation (the OR operator),<\/p>\n<p>so that an expression on either side can match. <span style=\"text-decoration: underline; color: #0000ff;\">Precedence is from left-to-right<\/span>, as encountered in the expression.<\/p>\n<p><strong>? (question mark)<\/strong> : Specifies that zero or one match of the preceding sub-pattern is allowed. Cannot be used with a Tag.<\/p>\n<p><strong>+ (plus)<\/strong> : Specifies that one or more matches of the preceding sub-pattern are allowed. Cannot be used with a Tag.<\/p>\n<p><strong>* (asterisk)<\/strong> : Specifies that zero or more matches of the preceding sub-pattern are allowed. Cannot be used with a Tag.<\/p>\n<p>&nbsp;<\/p>\n<p><strong><span style=\"text-decoration: underline;\">Character Classes<\/span><\/strong><\/p>\n<p><strong>[ ] (square brackets)<\/strong> Identifies a user-defined class of characters, any of which will match: <span style=\"color: #0000ff;\">[abc] will match a, b, or c.<\/span> Only three special metacharacters are recognized within a class definition, <strong>the caret (^)<\/strong> for complemented characters, <strong>the hyphen (-)<\/strong> for a range of characters, or one of the following  backslash escape sequences :<\/p>\n<p><strong>\\ &#8211; ] e f n q r t v x##<\/strong><\/p>\n<p>Any other use of a backslash within a class definition yields an undefined operation that should be avoided.<\/p>\n<p>&nbsp;<\/p>\n<p><strong>[-] (hyphen)<\/strong> The hyphen identifies a range of characters to match.<br \/>\n<span style=\"color: #0000ff;\">For example, [a-f] will match a, b, c, d, e, or f.<\/span><\/p>\n<p>Characters in an individual range must occur in the natural order as they appear in the character set.<br \/>\n<span style=\"color: #0000ff;\">For example, [f-a] will match nothing.<\/span><\/p>\n<p><strong>Lists of characters, and one or more ranges of characters,<\/strong> <strong>may be intermixed<\/strong> in a single class definition. The start and end of a range may be specified by a literal character, or one of the  backslash escape sequences:<\/p>\n<p><span style=\"color: #0000ff;\">\\ &#8211; ] e f n q r t v x##<\/span><\/p>\n<p>Any other use of a backslash within a class definition yields an undefined operation.<\/p>\n<p><span style=\"text-decoration: underline;\"><strong>Multiple ranges in a class<\/strong><\/span> are valid.<br \/>\n<span style=\"color: #0000ff;\">For example, [a-d2-5] matches a, b, c, d, 2, 3, 4, or 5.<\/span><\/p>\n<p>When the <strong>hyphen is escaped<\/strong>, it is treated as a literal.<br \/>\n<span style=\"color: #0000ff;\">For example, [a-c] is a list, not a range, and matches a, -, or c due to the  backslash escape sequence.<\/span><\/p>\n<p><strong>[^] (caret)<\/strong> When the caret appears as the first item in a class definition, it identifies a complemented class of characters, which will not match.<br \/>\n<span style=\"color: #0000ff;\">For example, [^abc] matches any character except a, b, or c.<\/span><\/p>\n<p>A range can also be specified for the complemented class. For example, [^a-z] matches any character except a through z.<\/p>\n<p>A caret located in any position other than the first is treated as a literal character.<\/p>\n<p><strong><span style=\"text-decoration: underline;\">Tags\/sub-patterns<\/span><\/strong><\/p>\n<p><strong>( ) (parentheses)<\/strong> : Parentheses are used to match a Tag, or sub-pattern, within the full search pattern, and remember the match. The matched sub-pattern can be retrieved later in the mask, or in a replace operation, with \u00001 through 99, based upon the left-to-right position of the opening parentheses.<\/p>\n<p>Parentheses may also be used to force precedence of evaluation with the alternation operator.<br \/>\n<span style=\"color: #0000ff;\">For example, &#8220;(Begin)|(End)File&#8221; would match either &#8220;BeginFile&#8221; or &#8220;EndFile&#8221;,<\/span><br \/>\nbut without the Tag designations,<br \/>\n<span style=\"color: #0000ff;\">&#8220;Begin|EndFile&#8221; would only match either &#8220;BeginndFile&#8221; or &#8220;BegiEndFile&#8221;.<\/span><\/p>\n<p><span style=\"text-decoration: underline;\">Note: Parentheses may not be used with ? + * as any match repetition could cause the tag value to be ambiguous.<\/span><br \/>\n<span style=\"color: #0000ff;\">To match repeated expressions, use parentheses followed by \u00001*.<\/span><\/p>\n<p><strong><span style=\"text-decoration: underline;\">Escaped characters<\/span><\/strong><\/p>\n<p><strong> (backslash)<\/strong>. The escape operator (single-character quote). The following character will be treated as a literal value rather than being interpreted as a special character. Note that the character following the backslash must actually be a special character, as follows:<\/p>\n<p><strong>b A word boundary<\/strong>. The start or end of a word, where a word is defined as one or more characters that include an alphabetic character (A-Z or a-z), a numeric character (0-9), and an underscore.<br \/>\n<span style=\"color: #0000ff;\">For example, &#8220;abc_123&#8221; is considered a single word and &#8220;abc-123&#8221; is considered two words.<\/span><\/p>\n<p><strong>c Case-sensitive search.<\/strong> Without the c operator, <span style=\"text-decoration: underline; color: #0000ff;\">the default is to ignore case when matching<\/span>. Unlike some other implementations of regular expressions, case-insensitivity is recognized in all operations, even a range of characters such as &#8220;[6-Z]&#8221;. The c operator may appear at any position in the mask.<\/p>\n<p>e Escape character:<\/p>\n<p>f\u00a0 Formfeed character:<\/p>\n<p>n Linefeed (or newline)<\/p>\n<p>q Double-quote mark (&#8220;):<br \/>\n<span style=\"color: #0000ff;\">example: &#8220;qHelloq&#8221;.<\/span><\/p>\n<p>r Carriage-return character<\/p>\n<p><strong>s Shortest match character:<\/strong> The s flag causes the shortest matching string to be returned, <span style=\"text-decoration: underline; color: #0000ff;\">rather than the longest (the default)<\/span>.<br \/>\n<span style=\"color: #0000ff;\">For example, when searching for the mask &#8220;abc.*abc&#8221; in &#8220;abcdabcabc&#8221;, the default setting would return position 1 and length 10. With the s switch set, it returns position 1 and length 7. This option may cause a slight increase in processing time.<\/span><\/p>\n<p>t Horizontal tab character<\/p>\n<p>v Vertical tab character<\/p>\n<p><strong>x## Hex character code:<\/strong> Indicates that an ASCII code follows, given by two hexadecimal digits.<br \/>\n<span style=\"color: #0000ff;\">For example, xFF = ANSI 255. XX must be in the range 0 through 255.<\/span><\/p>\n<p><strong>## Tag number:<\/strong> Evaluated as the characters matched by tag number ## (<span style=\"color: #0000ff;\">where ## is in the range 01 through 99<\/span>, in decimal). Tags are implicitly numbered from 01 through 99, <span style=\"color: #0000ff;\">based upon the left-to-right position<\/span> of the left parenthesis. <span style=\"color: #0000ff;\">&#8220;(\ufffd)w\u00001&#8221; would match &#8220;abcwabc&#8221; or &#8220;456w456&#8221;.<\/span><\/p>\n<p>Tags cannot be forward-referenced &#8211; that is, if a reference is made to any Tag that is not yet defined, a non-match is presumed.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Using Regular Expressions in MarcEditor Replacement Functions: Regular Expressions can now be utilized in the Replace and Edit Subfield functions in the MarcEditor. This allows users to create complex search and replacement functions. In&#46;&#46;&#46;<\/p>\n","protected":false},"author":1,"featured_media":2850,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[22],"tags":[84],"youtube_video":null,"_links":{"self":[{"href":"https:\/\/www.extradrm.com\/index.php?rest_route=\/wp\/v2\/posts\/285"}],"collection":[{"href":"https:\/\/www.extradrm.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.extradrm.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.extradrm.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.extradrm.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=285"}],"version-history":[{"count":0,"href":"https:\/\/www.extradrm.com\/index.php?rest_route=\/wp\/v2\/posts\/285\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.extradrm.com\/index.php?rest_route=\/wp\/v2\/media\/2850"}],"wp:attachment":[{"href":"https:\/\/www.extradrm.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=285"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.extradrm.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=285"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.extradrm.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=285"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}