Friday, October 30, 2015

r in regular expression

 the 'r' means the the following is a "raw string".
ie. backslash characters are treated literally instead of signifying special treatment of the following character.
so '\n' is a single newline
and r'\n' is two characters - a backslash and the letter 'n'
another way to write it would be '\\n' because the first backslash escapes the second
an equivalent way of writing this
print (re.sub(r'(\b\w+)(\s+\1\b)+', r'\1', 'hello     there      there'))
is
print (re.sub('(\\b\\w+)(\\s+\\1\\b)+', '\\1', 'hello     there      there'))
Because of the way Python treats characters that are not valid escape characters, not all of those double backslashes are necessary - eg '\s'=='\\s' however the same is not true for '\b' and '\\b'. My preference is to be explicit and double all the backslashes.


No comments:

Post a Comment