The Core

Why We Are Here => Water Cooler => Topic started by: rcjordan on August 26, 2025, 11:54:25 AM

Title: the most common regular expression is /.*/.
Post by: rcjordan on August 26, 2025, 11:54:25 AM
I suck at regex. Please interpret.

Note that the last period *might* be because it was at the end of a sentence, if so /.*/
Title: Re: the most common regular expression is /.*/.
Post by: ergophobe on August 27, 2025, 09:13:20 PM
It depends on your regex flavor.

In Posix, it would be "match a string that has a slash, zero or more characters, another slash and one character" so the following would match

https://a
user/bin/a

But I'm guessing this is so popular because some regex flavors use / as delimiters when invoking the regex in a programming language. So properly speaking, the / are not part of the regex and therefore the final . would be punctuation.

For example
https://www.php.net/manual/en/regexp.reference.delimiters.php

This *could* be a Posix style regex (like if you were running grep), but that seems very unlikely because it would be a very strange pattern to be popular.


So my assumption is that it is a PCRE expression that includes the delimiters with a simple period after.

So that would mean "match anything at all, including nothing, up to the next newline."

Note that in some flavors, a . matches newline, but again, I'm assuming not the flavor in question
Title: Re: the most common regular expression is /.*/.
Post by: ergophobe on August 27, 2025, 09:14:39 PM
https://regex101.com/
Title: Re: the most common regular expression is /.*/.
Post by: ergophobe on August 27, 2025, 09:15:57 PM
Prompt:

QuotePlease explain the following regular expression in plain language and include examples of strings that will and will not match (note that the final . is part of the regular expression and not punctuation for the sentence): /.*/.


Claude 3 Haiku - Anthropic (2.233s - 913 credits)

QuoteThe regular expression /.*/. is used to match any string of characters, including zero characters.

Here's a breakdown of the different components:

/ - This is the start of the regular expression.
.* - This matches any character (except for a newline) zero or more times. The . matches any single character, and the * allows it to repeat any number of times.
/ - This is the end of the regular expression.
. - This final period is also part of the regular expression and matches any single character.
Examples of strings that will match this regular expression:

"hello"
"123456789"
""
" "
Examples of strings that will not match this regular expression:

Strings that contain a newline character, like "hello\nworld".
Strings that do not contain any characters, like undefined or null.
The regular expression /.*/. is a very broad pattern that will match almost any non-empty string. It's often used as a starting point for more complex regular expressions, or in cases where you want to match anything at all.

Note that despite the prompt, Claude assumes that the / is a delimiter and thus assumes that the final period cannot be part of the regex and ignores it when it gives examples that would and wouldn't match.
Title: Re: the most common regular expression is /.*/.
Post by: ergophobe on August 27, 2025, 09:23:38 PM
BTW, I used to write hundreds of regex per day some days because I had years and years of plain text transcripts - thousands of pages if printed (maybe 5000? double-spaced) with 16th century spelling (i.e. great variation in spelling, especially for names).

Then I would start getting into long, involved things with negative lookbehinds and crazy stuff like that.

Here's a "simple" example (yes, long, but nevertheless simple) I would use with grad students when trying to teach them to use regex in their work

Quote(Lou?[iy]s,?.(le.)?barbier|barbier.{1,30}Lou?[iy]s|Lou?[iy]s.Tr[aeou][nu]ch[aeoui][nuv]a|grand?.Lou?[iy]s)
- case insensitive
- match newlines

Which is like separate searches for:
   Lou?[iy]s,?.(le.)?barbier
   barbier.{1,30}Lou?[iy]s
   Lou?[iy]s.Tr[aeou][nu]ch[aeoui][nuv]a
   grand?.Lou?[iy]s

Which yields 23 matches

       1  barbier de Chanssiez, et Loys
       1  Barbier et par sa femme.  Loys
       1  BARBIER, Louys
       2  Grand Loys
       3  Loys Barbier
       2  Lois Le Barbier
       9  Loys Le Barbier
       1  Loys Tronchena
       3  Loys Tronchona