Publications: 131 | Followers: 0

Title 1

Publish on Category: Birds 0

General Concept and Example
Aregular expressionis a sequence of characters that specifies a set of strings, which are said tomatchthe regular expression.For example, in one flavor of regular expression syntax:gli..ering set of strings that begin with "gli",followed by any two characters,followed by "ering"
Some Systems That Use REs
grepvi/emacs/other text editorsmost command shells (e.g., csh, bash, Windows shell)many programming languagesUnfortunately, this does not imply that all use the same syntax rules for REs.For historical reasons, there are many variations (flavors) of RE syntax.For the sake of sanity, we will restrict ourselves to the grep flavor.
RE Syntax Metacharacters
Most characters simply stand for themselves.Metacharacters have special meaning:period (.)matches any single charactera.cis matched byaac,abc, a)c, etc.b..tis matched by beet, best, boot,bart, etc.asterisk (*)matches zero or more occurrences of the preceding REab*cis matched by ac,abc,abbc,abbbc, etc..*is matched by all stringsplus (+)matches one or more occurrences of the preceding REab+cis matched byabc,abbc,abbbc, etc., but not by ac
RE Syntax Examples
$ grep -E MobyDick.txta fine frosty night; how Orionglitters; what northern lights! Let themglitteringteeth resembling ivory saws; others were tufted with knots offootfall in the passage, and saw aglimmerof light come into the roomglitteringin the clear, cold air. Huge hills and mountains of casks onglitteringexpression--all this sufficiently proclaimed him an inheritorlooked celestial; seemed some plumed andglitteringgod uprising fromsuddenly relieved hull rolled away from it, to far down herglitteringthe wife sat frozen at the window, with tearless eyes,glitteringlyglitteringfiddle-bows of whale ivory, were presiding over the hilarioustoglimmerinto sight. Glancing upwards, he cried: "See! see!" and onceAt the first faintestglimmeringof the dawn, his iron voice was heardleeward; and Ahab heading the onset. A pale, death-glimmerlit upglitteringmouth yawned beneath the boat like an open-doored marblemethodic intervals, the whale'sglitteringspout was regularly announcedthe moment, intolerablyglitteredand glared like a glacier; and
Note the use of the–Eswitch in the example here. This specifies togrepto use certain extensions to the basic RE syntax; rather than fuss about the difference, we will simply invokegrepwith this switch in all cases.
RE Syntax Examples
$ grep –n -E fe+d MobyDick.txt278: Tho' stuffedwith hoops and armed with ribs of whale."746:I stuffeda shirt or two into my old carpet-bag, tucked it under my arm,1267:Whether that mattress was stuffedwith corn-cobs or broken crockery,1381:he puffedout great clouds of tobacco smoke. The next moment the light1822:But Faith, like a jackal,feeds among the tombs, and even from these2644:How I snuffedthat Tartar air!--how I spurned that turnpike earth!--that2903:Hosea's brindled cowfeeding on fish remnants, and marching along the4929:own. Yet now,federated along one keel, what a set these Isolatoes were!. . .
$ grep -E travel+er MobyDick.txtthe great New England traveller, and Mungo Park, the Scotch one; of allpalsied universe lies before us a leper; and like wilful travellers inmore travellers than in any other part.. . .
Note the use of the–nswitch in the example here. This specifies togrepto report line numbers along with the matching lines.
RE SyntaxMetacharacters
question mark (?)matches zero or one occurrence of the preceding REab?cis matched by ac andabc, but not byabbcb.?tis matched bybt, bat, bet,bxt, etc.logical or (|)matches the RE before | or the RE after |abc|defis matched byabcanddefand nothing else
RE Syntax Examples
$ grep -E fee?d MobyDick.txtTho' stuffed with hoops and armed with ribs of whale."I stuffed a shirt or two into my old carpet-bag, tucked it under my arm,Whether that mattress was stuffed with corn-cobs or broken crockery,he puffed out great clouds of tobacco smoke. The next moment the light. . .
$ grep -E 'equal|same' MobyDick.txtand some other articles of the same nature in their boats, in order to"And pray, sir, what in the world is equal to it?" --EDMUND BURKE'Sto have indirectly hit upon new clews to that same mystic North-Westnearly the same feelings towards the ocean with me.. . .
Note the use of single-quotes around the RE in the second example; this is absolutely necessary in the Unix shell because the '|' character has special meaning to the shell and that takes priority; the same applies in the Windows shell except that double-quotes are used.
RE Syntax Metacharacters
caret (^)used outside brackets, matches only at the beginning of a line^D.*is matched by any line beginning with Dsee slide 10 for semantics if inside brackets…dollar sign ($)matches only at the end of a line.*d$is matched by any line ending with a d
RE Syntax Examples
$ grep -E ^equal MobyDick.txtequalled by the realities of the whalemen.equally desolate Salisbury Plain in England; if casually encounteringequal to that of the brain. Under all these circumstances, would it beequally doubted the story of Hercules and the whale, and Arion and the. . .
$ grep -E equal$ MobyDick.txttwenty pounds; so that the whole rope will bear a strain nearly equal
The first example does not work properly in the Windows shell unless you put double-quotes around the RE.
RE SyntaxMetacharacters
backslash (\)escapes othermetacharactersnow\.is matched by "now."square brackets []specify a set of characters as a set; any character in the set will match[aeiou]is matched by any vowel[a-z]is matched by any lower-case letter^ specifies the complement (negation) of the set[^aeiou]is matched by any character but 'a', 'e', 'i', 'o' and 'u'parentheses ()forms a group of characters to be treated as a unita(bc)+is matched byabc,abcbc,abcbcbc, etc.braces {}specifies the number of repetitions of an RE[a-z]{3}is matched by any three lower-case letters
RE Syntax Examples
$ grep -E 'equal(ly)?$' MobyDick.txttwenty pounds; so that the whole rope will bear a strain nearly equaleven now beholding him; aye, and into the eye that is even now equally
$ grep -E '^f[aeiou]t' MobyDick.txtfathoms down, and 'the weeds were wrapped about his head,' and all thefather was a High Chief, a King; his uncle a High Priest; and on thefuture investigators, who may complete what I have here but begun. If. . .fetch another for a considerable time. That is to say, he would thenfathoms of rope; as, after deep sounding, he floats up again, and shows. . .fitted to sustain the weight of an almost solid mass of brick andfatal cork, forth flew the fiend, and shrivelled up his home. Now, for. . .
RE Syntax Examples
$ grep -E '^f[aeiou]+t' MobyDick.txtfoot of it. But I got a dreaming and sprawling about one night, andfootfall in the passage, and saw a glimmer of light come into the roomfathoms down, and 'the weeds were wrapped about his head,' and all thefeet high; consisting of the long, huge slabs of limber black bone takenfeatures of the leviathan, most naturalists have recognised him for one.future investigators, who may complete what I have here but begun. Iffaithfully narrated here, as they will not fail to elucidate severalfitted to sustain the weight of an almost solid mass of brick and. . .
$ grep -E 'br(ing){2}' MobyDick.txtmyself involuntarily pausing before coffin warehouses, and bringing upjustified his bringing his harpoon into breakfast with him, and using itbringing in good interest.. . .
RE Syntax Examples
$grep-E'1(0){,4}' SomeNumbers.txt
According to the man page forgrep, the repetition expression{,m}should cause a match tomor fewer occurrences of the RE to which it is applied.So, the following should match 1, 10, 100, 1000 and 10000:
However, the GNU implementation ofgrep, does not implement this as described.Instead, you should use{0,m}, as in:
$grep-E'1(0){0,4}' SomeNumbers.txt
RE SyntaxMetacharacters
word boundaries (\<and\>)specifies to only match entire words (in a loose sense)\<fat\>is matched by "fat" but not "father" or "fathom"
$ grep -E '\<fat\>' MobyDick.txtnothing certain. They grow exceeding fat, insomuch that an incredibleDUTCH SAILOR. Grand snoozing to-night, maty; fat night for that. Iexceeding richness. He is the great prize ox of the sea, too fat to be. . .
Of course, grep doesn't "understand" English. Word boundaries are indicated by the beginnings and ends of alphanumeric sequences of characters.
More Regular Expressions
Ending a sentence with a preposition is something up with which I will not put.W Churchill
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.Jamie Zawinski
How can you search a file for sentences that end with a preposition?It seems we need to determine two things:- what are prepositions?- what characters might mark the end of a sentence?
The second question seems to be fairly easy:. ! ?Some sentences end with a double-quotation mark, but that will probably be preceded by one of the marks above. And some end with an ellipsis…This suggests:[.?!]|\.\.\.
So, what are prepositions? A preposition relates a noun or pronoun to another word in a sentence.One source says there are 150 of them and gives the following partial list:aboard about above across after againstalong amid among anti around asat before behind below beneath besidebesides between beyond but by concerningconsidering despite down during except excepting excluding following for from in insideinto like minus near of offon onto opposite outside over pastper plus regarding round save sincethan through to toward towards underunderneath unlike until up upon versusvia with within withoutAllegedly, the most common ones are:to, of, in, for, on, with, at, by, from, up, about, into, over, after
$ grep -E '(\<to\>|\<of\>|\<in\>|\<for\>|\<on\>|\<with\>|\<at\>|\<by\>|\<from\>|\<up\>|\<about\>|\<into\>|\<over\>|\<after\>)([.?!]|\.\.\.)' MobyDick.txtonce a whale in Spitzbergen that was white all over." --A VOYAGE TOup a pair of as pretty rainbows as a Christian would wish to look they possibly can without falling in. And there they stand--miles ofpenny that I ever heard of. On the contrary, passengers themselves mustone lodges a looker on.the tidiest, certainly none of the finest. I began to twitch all over.leaving a little interval between, for my back to settle down in. But Itill spoken to. Holding a light in one hand, and that identical Newout a sort of tomahawk, and a seal-skin wallet with the hair on. Placinghe never would have dreamt of getting under the bed to put them on. Atbe sure there is more in that man than you perhaps think for.night previous, and whom I had not as yet had a good look at. They wereto. Then the Captain knows that Jonah is a fugitive; but at the samean adventurous whaleman to embark from. He at once resolved to accompanywhom I now companied with.. . .
This suggests the regular expression used below:
POSIX Character Classes
The POSIX definition of extended regular expressions includes definitions of some classes of characters, including:
Let's use a character class to look for digits in a file (note the syntax):
$ grep -E [[:digit:]] MobyDick.txtLast Updated: January 3, 2009Posting Date: December 25, 2008 [EBook #2701]Release Date: June, 2001In chapters 24, 89, and 90, we substituted a capital L for the symbolNARRATIVE TAKEN DOWN FROM HIS MOUTH BY KING ALFRED, A.D. 890.GREENLAND, A.D. 1671 HARRIS COLL."Several whales have come in upon this coast (Fife) Anno 1652, oneinformed), besides a vast quantity of oil, did afford 500 weight ofSTRAFFORD'S LETTER FROM THE BERMUDAS. PHIL. TRANS. A.D. 1668.northward of us." --CAPTAIN COWLEY'S VOYAGE ROUND THE GLOBE, A.D. 1729.ON BANKS'S AND SOLANDER'S VOYAGE TO ICELAND IN 1772.--THOMAS JEFFERSON'S WHALE MEMORIAL TO THE FRENCH MINISTER IN 1778."In 40 degrees south, we saw Spermacetti Whales, but did not take"In the year 1690 some persons were on a high hill observing theSAID VESSEL. NEW YORK, 1821.of this one whale, amounted altogether to 10,440 yards or nearly six--THOMAS BEALE'S HISTORY OF THE SPERM WHALE, 1839.--FREDERICK DEBELL BENNETT'S WHALING VOYAGE ROUND THE GLOBE, 1840.October 13. "There she blows," was sung out from the mast-head.--J. ROSS BROWNE'S ETCHINGS OF A WHALING CRUIZE. 1846.. . .
Let's use character classes to look for strings that consist of one or more alphabetic characters followed immediately by one or more digits:
$ grep -E '[[:alpha:]]+[[:digit:]]+' MobyDick.txtupwards of L1,000,000? And lastly, how comes it that we whalemen ofSavesoul's income of L100,000 seized from the scant bread and cheesewithout any of Savesoul's help) what is that globular L100,000 but afish high and dry, promising themselves a good L150 from the preciousPROVIDED IN PARAGRAPH F3. YOU AGREE THAT THE FOUNDATION, THE
Why the POSIX Classes?
Suppose you need to use a regular expression for a search on a system that does not use ASCII encoding for characters?The order in which character codes are assigned to characters may not be compatible with ASCII.So, it could be that A-Z doesn't define a valid range that includes all capital letters and nothing else.Now, you might be able to figure out a workable range specification…… but you wouldn't have a portable solution.The POSIX classes give us a way to manage these issues in a portable manner.Fortunately, GNU grep does support the POSIX classes described earlier.
What do you think the following searches will find?
$ grep -E '\<the\>\<Pequod\>' MobyDick.txt
$ grep -E '\<[Cc]aptain\>\<Ahab\>' MobyDick.txt
$ grep -E '\<[Cc]aptain\> \<Ahab\>' MobyDick.txt
$ grep -E '\<better\> \<than\> \<nothing\>' MobyDick.txt
$ grep -E 'better than nothing' MobyDick.txt
grepman(ual) Page
-i, --ignore-caseIgnore case distinctions in both the PATTERN and the input files.-v, --invert-matchInvert the sense of matching, to select non-matching lines.-w, --word-regexpSelect only those lines containing matches that form whole words. Thetest is that the matching substring must either be at the beginning ofthe line, or preceded by a non-word constituent character. Similarly,it must be either at the end of the line or followed by a non-wordconstituent character. Word-constituent characters are letters, digits,and the underscore.-x, --line-regexpSelect only those matches that exactly match the whole line.-c, --countSuppress normal output; instead print a count of matching lines for eachinput file. With the -v, --invert-match option (see below), count non-matching lines.-o, --only-matchingPrint only the matched (non-empty) parts of a matching line, with eachsuch part on a separate output line.
grepman(ual) Page
-m NUM, --max-count=NUMStop reading a file after NUM matching lines. If the input is standardinput from a regular file, and NUM matching lines are output, grepensures that the standard input is positioned to just after the lastmatching line before exiting, regardless of the presence of trailingcontext lines. This enables a calling process to resume a search.When grep stops after NUM matching lines, it outputs any trailingcontext lines. When the -c or --count option is also used, grep doesnot output a count greater than NUM. When the -v or --invert-matchoption is also used, grep stops after outputting NUM non-matching lines.-n, --line-numberPrefix each line of output with the 1-based line number within its inputfile.-A NUM, --after-context=NUMPrint NUM lines of trailing context after matching lines. Placesa line containing a group separator (--) between contiguous groups ofmatches. With the -o or --only-matching option, this has no effect anda warning is given.-B NUM, --before-context=NUMPrint NUM lines of leading context before matching lines. Places aline containing a group separator (--) between contiguous groupsof matches. With the -o or --only-matching option, this has no effectand a warning is given.





Make amazing presentation for free
Title 1