Masthash

#regex

Thomas Kujawa
20 hours ago

Es ist schon spät. :mastosleeping:

Die unter https://torstenlandsiedel.de/2018/06/17/custom-validation-fuer-postleitzahlen-bei-contact-form-7/ beschriebene Funktion würde ich gern so ändern, dass ein Fehler erscheint, wenn es kein gültiger Vorname ist. regex: ^(\pL+[ -])*\pL+$/u

#WordPress #CF7 #regex #customValidation

Sam Steiner and 487 others
1 day ago

Sunday #Regex Fun. 🎉

@JordiGH this is like what started me on #Perl.

The oreilly #regex book used Perl in the examples so I thought Perl was how one used regexes

Granted, Randall didn’t do me any favors with this comic https://xkcd.com/208/

Jamie Magee :unverified:
6 days ago

It is not regex
There's no way it is regex
It was a regex

#haiku #regex #debugging

Kalamata Hari
6 days ago

@SnoopJ A basic rule of backtracking #regex engines (including perl's) is that they always find the leftmost match. Also, at any given starting position, the greediness of quantifiers only affects how much of the string is matched, not whether the regex matches.

This is because the regex engine proper only checks for a match at a given position. There is an outer loop that invokes the regex engine for every possible starting position in the string, from left to right. (At least notionally. There are some optimizations for anchored regexes, etc.) Since a match is possible at offset 0, that is what is found.

David Zaslavsky
6 days ago

@SnoopJ Ooh that's an interesting case. I mean, I kind of understand why it happens from the perspective of the regex engine: because it returns the first match it can find working from the beginning of the string, rather than searching the whole string for the globally "best" match. So yeah, I would expect that regex engines generally behave the same way across different implementations. It would reduce efficiency quite a bit to have them behave otherwise in general, although it'd be nice if there was a simple way to get that behavior when you want it. (you could cobble together something using `re.match()` with hardcoded start positions, I think)

It is an easy mistake to make without realizing it - I know from experience! The trick I've adopted to avoid getting caught by that is to explicitly exclude closing delimiters, i.e. use `<[^>]+>` instead of `<.+?>`.

#Python #regex

Cegorach
1 week ago

#RegEx - die Ursache und Lösung aller Probleme!

https://github.com/philc/emacs-config/blob/e15bee0d18b8325b717523ce172e755085404b39/.emacs.d/init.el#L1722
#regex and idemoptence
things which are idempotent in #emacs
defun.
setq to a constant.
add-to-list with a constant.
add-hook, but preferably adding a symbol rather than a lambda expression.
enabling/disabling a minor mode.
wrapping some of the above in conditions.
:blobcatgooglyshrug:

Kevin Lossner
2 weeks ago

Next week's Thursday 15:00-16:30 CET memoQ&A for the course "memoQuickies Resource Camp" will be an open public session to handle any questions regarding #memoQ's auto-translation rules, the #Regex Assistant and segmentation rules (which will actually be the main topic for two weeks after we finish off the Regex Assistant libraries). Everyone is welcome, regardless of whether they are enrolled in the course.

https://www.linkedin.com/events/openmemoq-aforauto-translationr7110758313268580352/

#xlt #translation #TranslationStudies

icons for auto-translation rules, the Regex Assistant and segmentation rules
Kevin Lossner
2 weeks ago

My lecture from this afternoon, "The #memoQ #Regex Assistant Revisited" is available with the slides for viewing and/or download for a while (a week? two weeks? a month? until Ukraine kicks the muZkovian orcs out of the country? I dunno) at the following link:

https://transtrib-tech.teachable.com/courses/memoq-resource-camp/lectures/49401658

#xl8 #TranslationStudies

The memoQ Regex Assistant button icon
Martin Brüggemeier
2 weeks ago

@GermanENtrans @linguacaps Yes, #regex can be used in cafetran, for defining segmentation rules/exceptions and for searches.

Kevin Lossner
2 weeks ago

@Martinbr48683 @linguacaps
Martin, can you tell me if users can apply #regex anywhere in #Cafetran? If yes, I have a few interoperability tests in mind.....

Kevin Lossner
2 weeks ago

Ain't it great to have a little time to kill a lot of admin(s)? You shouldn't have to fuck them to get paid.

https://www.translationtribulations.com/2023/09/flirting-with-fiverr-more.html

#training #consulting #regex #memoQ #xl8

Tom
2 weeks ago

It'll never cease to amaze me how #regex being the solution to a problem makes people run away as if they were set on fire 🤔

robrich
3 weeks ago
Scott Williams 🐧
3 weeks ago

Just in case this doesn't make sense to anyone, #Ansible has a commonly used tool called lineinfile that let's you do things like update server config files. You can use regex to identify which line to update. With backrefs, you can do some particularly intricate/powerful things so that the line that is written can be a combination of #regex matches and ansible variables.

https://dev.to/ferricoxide/code-explainer-regex-and-backrefs-in-ansible-code-gn7

today i forgot
3 weeks ago

Using #regex with a capture group in sed:

$ sed -E "s/(.*)\.jpeg/\1.jpg/g" <<< "example.jpeg"

The pattern matched within the brackets is saved as a 'capture group' which is referenced with '\1'.

Kevin Lossner
3 weeks ago

This public page in an online course lists some of my favorite resources for working with #regex in #translation with #memoQ, #Trados or #Phrase or solving mysterious issues with text or handling #PDF files.

The site has other public pages with book and article recommendations and free configured resources for greater productivity in memoQ.

https://transtrib-tech.teachable.com/courses/memoquickies-resource-camp/lectures/49039953

#xl8 #l10n #TranslationStudies

icons for tools taught at the #memoQuickies Resource Camp
glitchcake⚡🍰
3 weeks ago

I should try writing a #regex engine for fun

❤️(L)ich
3 weeks ago

Mal was ganz anderes... Jemand der sich mit regex (python) auskennt? Ich mache nutze im matching ein [^;]+ um alles zu matchen außer einem Semikolon. Ich möchte aber nun eigentlich Semikolon gefolgt von carriage return ausschließen. Irgendwie komm ich da grad nicht weiter.

Hätte da jemand einen Tipp?

#followerpower #python #regex

Mark Stosberg
4 weeks ago

This deadly #regex was included in an .npmignore file:

*.db*

It broke the day a build generated a file with a randomly generated hash included `.db` in the middle of the file name:

chunk.208.dbf172ad32f72f21a5dc.js

#NodeJS devs: Check your .npmignore files to day to make sure you use wildcards at the beginning or end of the line, but not both!

https://github.com/TryGhost/Ghost/commit/d4217bd3216569afd359cf2d0f3cf886ba36d85a

Dave Mackey
1 month ago

I'm using the following #regex #code to strip a #url to its domain name and TLD:
url = url.replace(/^(?:https?:\/\/)?(?:[^./]+\.)?([^./]+\.[^./]+).*$/i, "$1");

For example, if url = ereader.perlego.com after stripping url = perlego.com

Cool. But this doesn't work for urls like jprs.co.jp. Technically, it performs correctly (stripped url = co.jp) but functionally it isn't what's needed (should be jprs.co.jp). Any suggestions on how to handle these sorts of cases?

#question #coding

Hal Canary
1 month ago

My wife's University is moving over to #CanvasLMS. She asked me how the regular expressions work, and I had a guess that happened to work, but I couldn't find any documentation on which of the dozens of dialects of #RegEx syntax they use. Anyone know?

Leon Cowle
1 month ago

Reason number 327 you know you love your job:

Volunteering (insisting) to code an emergency workaround fix for your prod ecom website at 11pm on a Friday night, whilst on vacation with your laptop tethered to your phone.

And having a LOT of fun doing so (#regex, #Fastly VCL, Fastly Fiddle, ftw)!

technicat
1 month ago
`Da Elf
1 month ago

I'd like much better #search #functions in my #SMS apps rather than syncing it to my #IMAP server and using Thunderbird (meh) or Perl / #RegEx from #BASH to find something important from a non-regular sender six months ago that is suddenly necessary again.

Rule of thumb: When working with #Pandas on a CSV file - and I mean actual CSV, not some ascii gibberish- and someone suggest using a #RegEx, it's almost always a bad, nay, very bad idea. 🤭

barefootstache
1 month ago

#DailyBloggingChallenge (36/50)

#TIL that parsing #RegEx via sed and pattern matching two different syntax variants are in use. The former requires a lot of escape character where as the latter doesn’t.

Take for example

\([0-9]\{2\}\-\)\+\([0-9]\{2\}\)

which is the correct syntax when using sed, where as it won’t work with pattern matching over =~. In such a case it would be

([0-9]{2}-)+([0-9]{2})

This cost me so much debugging fun.

#bash

Dave Allen
1 month ago

Why can't I use a regex for search and replace in Excel? (macOS) I can do so in Adobe apps like InDesign and Bridge. #excel #office365 #regex

Sundeep
1 month ago

Hello!

I am pleased to announce a new version of my "CLI text processing with GNU awk" ebook.

Learn the `GNU awk` command step-by-step from beginner to advanced levels with hundreds of examples and exercises. Regular Expressions will also be discussed in detail.

Links:

* PDF/EPUB versions: https://learnbyexample.gumroad.com/l/gnu_awk (free till 31-August-2023)

* Web version: https://learnbyexample.github.io/learn_gnuawk/

* Markdown source, example files, etc: https://github.com/learnbyexample/learn_gnuawk

* Interactive TUI app for exercises: https://github.com/learnbyexample/TUI-apps/blob/main/AwkExercises

I would highly appreciate it if you'd let me know how you felt about this book. It could be anything from a simple thank you, pointing out a typo, mistakes in code snippets, which aspects of the book worked for you (or didn't!) and so on. Reader feedback is essential and especially so for self-published authors.

Happy learning :)

#linux #awk #regex #ebook

Cover image for "CLI text processing with GNU awk" ebook
Akshar Varma
1 month ago

A couple of other #emacs things you can combine together with the ideas above:

1. Narrowing the buffer to only the portion of interest (or element, subtree, etc.). This is so good. It clears space and visual clutter, it ensures unwanted things don't get changed out of your view. I have also used it when presenting something in class; I didn't want the whole buffer to use up space, so I narrowed to the relevant part, allowing me to direct attention, increase font size and other nice things.

2. Swapping the keybindings for isearch with #regex and without regex. So my default search uses regexp. For most cases, you may not even notice the difference since you would be using alphanumeric characters in your search term. The only time it genuinely affects me is when I search for a `.` in the text and I have to escape it so that it searches for a literal `.` and not a wildcard match. But that is a small enough case that I can ignore that.

Akshar Varma
1 month ago

Another #emacs editing discovery. Every time my thought process goes along the lines of: "I can also do this!?!!"

Today's cast: #occur, #regex isearch, wgrep, and optionally keyboard macros, iedit and wdired.

(h/t This thread: https://emacs.ch/@ramin_hal9001/110933437057616428.
@ramin_hal9001 and @cwebber were mentioning many cools emacs tips and I learnt that I can *edit* what comes out of occur!)

TIL that you can switch from isearch to occur mid search! So you can do something like:
1. Start incremental (regexp) search in an editable buffer.
2. Once satisfied with the regex/search term, Do `M-s o` to have all matching lines end up in an occur buffer.
3. Press `e` to enable wgrep giving you an editable buffer with only matching lines.
4. Do whatever editing you like: iedit, regex replace, keyboard macros, whatever.
5. `C-c C-c` when done and the changes get reflected in *original buffer*.

Triggering occur after incremental search gets you the best of both worlds: the visual feedback on the regexp search helping to interactively adjust it. You jump to occur only when you're satisfied with the matches. It will be as if you had magically typed out that complex regexp free-hand.

@rml But tbf maybe they just hate #regex ?

[^\x00-\x7F]+
Assuming i understand the issue

Kalamata Hari
1 month ago

@fosskers We can also revisit our first step of extracting words from a string. Our code says that a "word" is a sequence of 1 or more alphanumeric characters, but then immediately checks that its length is at least 3. We can do that as part of the initial regex:

my @words =
uniqstr
grep !$banned{$_},
map /[^0-9]/ ? lc : (),
$string =~ /[a-zA-Z0-9]{3,}/g;

We can make our regex more concise, too. The character class \w represents the set of all alphanumeric characters and _ (underscore). By default, this includes all Unicode letters and digits. However, by using the /a regex flag, we can restrict \w (and similar shorthand sets) to only match ASCII characters.

Switching to /\w{0,3}/ag would not be entirely correct, though: It would match _, which is an illegal symbol. We can fix this via a detour through double negation: [^\W] matches any character that is not a member of \W (non-word characters), the inverted \w (word characters) set. In other words, [^\W] is equivalent to \w, but it lets is exclude more characters by adding them to the bracket group: [^\W_] matches any character that is not a non-word character or an underscore, i.e. any word character that is not _.

my @words =
uniqstr
grep !$banned{$_},
map /[^0-9]/ ? lc : (),
$string =~ /[^\W_]{3,}/ag;

Furthermore, [^0-9] can be written as \D (non-digit character) with the /a regex flag in effect. In this case we don't need /a, however: /\D/a would allow non-ASCII digits to pass through whereas /\D/ would reject them, but the distinction is moot because we know our strings are all ASCII.

my @words =
uniqstr
grep !$banned{$_},
map /\D/ ? lc : (),
$string =~ /[^\W_]{3,}/ag;

#perl #regex

hoergen
1 month ago

Ich hätte da mal eine Frage an die FreundInnen von Regulären Ausdrücken.

Wie gestalte ich mein Suchen&Ersetzen String, um bei dieser Liste diesen hinteren Teil "-00.00.00.000-01.23.10.760" in all seinen Variationen in allen Zeilen zu entfernen? Ich stehe gerade auf dem Schlauch.

Also :%s/ //g ?

- Lernen_Teil 1-00.00.00.000-01.23.10.760
- Lernen_Gänse gehen gerade aus-00.00.00.000-00.41.05.200
- Lernen_Das Ende der Kurve-00.00.00.000-00.42.08.600
- Lernen_Das Dudelsackkonsortium-00.00.00.000-00.42.12.840
- Lernen_In der Abendsonne-00.00.00.000-00.41.02.000
- Lernen_Fronkensteen-00.00.00.000-00.42.11.232
- Lernen_Mittlerer Westen im Osten-00.00.00.000-00.42.12.080
- Lernen_Avulsives Schnalzen-00.00.00.000-00.42.11.928
- Lernen_Die sieben Stufen der Rampe-00.00.00.000-00.41.06.288
- Lernen_Gerade bei Kurven-00.00.00.000-00.41.10.488
- Lernen_Pfeifensaiten-00.00.00.000-00.41.56.784
- Lernen_Find And Replace-00.00.00.000-00.42.18.080
- Lernen_Viele Dentron-00.00.00.000-00.40.45.480
- Lernen_Gänseblümchen-00.00.00.000-00.42.18.280

#RegEx #Vim #ReguläreAusdrücke

Schenkl 🏳️‍🌈
1 month ago

Gibt es keinen #Quantifier in #regex für "n oder m mal"?

Also nicht *, +, {n}, {n,} oder {n,m}

Luke T. Shumaker
1 month ago

(Example: Read and "understand" #Python's html.parser.locatestarttagend_tolerant -- now tell me how it will handle <A 0"="> -- clearly not what the author intended or what most readers will understand) #regex

Luke T. Shumaker
1 month ago

I'm generally of the stance that #regex isn't nearly as bad as people say, to read or to write; especially if using something like #Python's re.VERBOSE. But holy cow is it easy to accidentally write an expression that does surprising things with look-behind/look-ahead. re2 and #GoLang not supporting those is a good call, IMO.

takeonrules
1 month ago

Explaining new functionality added to my random-table.el package. This explanation includes the Emacs Lisp code and some explanation. Ultimately, building these random tables grows my personal GM notebook; encoding logic and making it readily and consistently available (when I have my computer).

http://takeonrules.com/2023/08/20/adding-rudimentary-handling-of-math-operands-in-random-table-package/

#regex #ttrpg #rpg #osr #emacs

Developers: how do you personally pronounce "regex"?

* Rej-ex ("Rej" as in "register")?
* Reg-ex ("Reg" as in "regular")?
* Something different? - share in the comments!

#Regex #RegularExpressions #Programming

lamp
1 month ago
Hmm how to make the last part optional without breaking the capture group? #regex
postmodern
1 month ago

Protip: if you want to validate international text, never use [A-Za-z] in your regexs. Use \p{L} (unicode letter), \p{Lu} (uppercase unicode), \p{Ll} (lowercase unicode).
https://ruby-doc.org/3.2.2/Regexp.html#class-Regexp-label-Character+Properties
#ruby #regex

Marco "Ocramius" Pivetta
2 months ago

In my defense, I did comment my code in detail... #regex

Massive multiline regular expression in #PHP, with comments in it
Stark
2 months ago

It is already working!

Using my already existing #MastoBot implementation, an #opensource package for developing Mastodon bots, I was able to make @remindMe in just a few minutes!

The implementation was quite simple, and the most complicated part is probably just the #regex. As a fun experiment, I am using the native Mastodon scheduling feature. Rather than having the bot use some database and regular checks, when you mention it, it simply creates a scheduled post for the requested time. Thus, the timing and reminder are completely independent from the actual bot and its runtime!

Feel free to check it out!

I still need to add an acknowledging message just to tell you that the reminder had been set. But until then, I'll just make it favorite your post.

You can reply to any other post as a direct mention using the format listed, and you will be reminded.

The same library was used to build @3dprinting and @Python

A screenshot illustrating how the bot reminds you of a post.
Stark
2 months ago

I sometimes just love #regex!

I am busy with testing for my next #MastoBot implementation, for a remindMe bot idea.

The main idea is to allow users to mention it and say @remindMe 2 weeks 3 days 2 hours 5 minutes, which would then in turn message them, reminding them of the post which they replied to with this comment.

Such functionality can sometimes be difficult, and regex gets its hate for speed, complexity, and readibility. But this just works great in #Python for a problem like this.

Here is the #Gist of an early prototype and example for those interested!

The bot is also at @remindMe, but it isn't operational yet. You can follow it for updates.

https://gist.github.com/e-dreyer/ce0f5e8c51d6454f91901f78f9e04b77

@Python #Python

eklem
2 months ago

Second is Mathias Bynens' great work on regular expressions in JavaScript. What I need there is stuff considering Emojis to actually extract/match all the latest Unicode Emojis. This I need for two things - A regular expression library that I want to used for a small JavaScript search engine library and in a one-time-pad encryption/decryption library.

https://social.vivaldi.net/@mathias@mastodon.xyz
https://github.com/mathiasbynens/rgi-emoji-regex-pattern

#dev #JavaScript #funding #oss #unicode #regex #emoji #encryption #decryption #otp #search

Walker Boh🛡
2 months ago

A little while ago one of the bigger C Sharp guys was talking about using split/join to remove a given option from a string (so "&option=one", which could be anywhere in the string).

In the discussion I said it could be done with one line of regex versus the five lines required for split/join. The riposte was that regex is notoriously slow. I wasn't so sure of that so I decided to put it to the test tonight.

I wrote some code that generates 100K random input lines, and then runs the two methods of removing option=one head to head. Knowing the regex is mostly expensive to compile, I also added in precompiled regex version (but that sort of defeats the original purpose, as its the same amount of lines as split/join).

For giggles I also put them head to head in two languages - Java and Python. Hat tip to milady @scrumtuous

The results were interesting. Regex IS the slowest, but in Java not by much (about 20%). In Python the difference is striking at 100% slower.

Compiled regex wins the day on speed in both languages coming across the line at 52ms in Java and 165ms in Python.
Fun brain stretch for the night...

You can find the code here:
https://github.com/nakedmcse/RegexTest

#regex #java #python

Python results
Java results
Kevin Stewart
2 months ago

Do you get off on #regex? Well, boy howdy, do I have something for you. Needed to run a brief analysis on browser types from #Apache access logs with #Pandas. This seems to work for *most* log lines I had presented to me:

'^(\S+)\s(\S+)\s\[(.*?)\]\s(\S+)\s"(\S+)\s(\S+)\s(\S+)"\s(\d+)\s(\S*-\S*|\d+)\s(\S*-\S*|\d+)\s"(.*?)"\s(-|\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}(?:,\s\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})*)$'

Enjoy

#nvim regexplainer, the plugin which tells you what your #regex is doing, now has long-awaited support for lookbehind assertions, thanks to the fixes in the upstream #treeSitter parser

Give'r a shot and let me know what you think

https://github.com/bennypowers/nvim-regexplainer/pull/39

Southern Wolf 🐧🦀
2 months ago

"https:\/\/(.*)\/u\/(.*)"

Why is #Regex so powerful... Yet so cursed all at the same time. :blobfoxcomputerterrified:​

Mark Gardner ‍:sdf:
2 months ago

@ajaxStardust @ajaxStardust @Perl Don't parse #HTML with #RegEx.
Just don't: https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454
Use a #DOM parser to add the attributes, e.g., in #Perl:

use Mojo::File;
use Mojo::DOM;

for ( map Mojo::File->new($_), @ARGV ) {
my $dom = Mojo::DOM->new( $_->slurp );
$dom->find('h1, h2, h3, h4, h5, h6')->each( sub {
$_->attr( map { $_ => $_[0]->text =~ s/\s+/_/gr } qw(name id) );
} );
$_->spurt( $dom->content );
}

The infamous “Zalgo” answer to a question on Stack Overflow about matching HTML with regular expressions
Nils Goroll
2 months ago

.foreach() for regular expression matches has come to #opensource Varnish HTTP Cache.

Our #pcre2 #regex module https://gitlab.com/uplex/varnish/libvmod-re for #varnishcache now also supports iterating over matches on strings and HTTP bodies

ajaxStardust
2 months ago

#regex #question

I was going through some HTML markup/ content where my <h1>,<hx> tags have no anchors/ ID's. pain in the butt to do that manually if i want anchors, It occurred to me: why don't I just process the markup and use the &#x003C;h1&#x003e; (..contents ...) &#x003C;/h1&#x003e;
but the (...contents...) atom needs filtering. E.g. "this just in", maybe I want "this_just_in"

That's not really possible w/ #PCRE is it?

Maybe this is something #sed can do? Im not super fmlr w/ all gnutils

In my book I talk a bit about regular expressions, but don't want to turn a Terraform book into a regex book.

What is the best resource- book, blog, tutorial, video, etc- that I could point someone to for them to learn more about regular expressions?

#tech #terraform #regex #golang

Doug Parker 🕸️
2 months ago

@emattiza @yoavlavi The challenge with competing against #regex is that the concept is so ubiquitous as to have direct support is basically every modern programming language. It's either in the standard library or has a carve out in the language syntax. Getting someone to choose an alternative to that is a big ask.

To be a viable alternative, I think Melody needs to:
1) Be as available as regex in basically every major language and environment: Java, C#, Python, Swift, SQL, PHP, Ruby, etc.
2) Take up enough "developer mindshare" such that people recognize and conceptualize Melody as a viable alternative to regex.

The first is necessary for the second and the second is *really* hard. Melody needs to get to the point where universities, Stack Overflow, and ChatGPT say "You could solve this problem with regex, but you should use Melody because the community generally agrees it's better".

This is as much a marketing problem as a technical problem.

Jan :rust: :ferris:
2 months ago

Happy #Regex

(for screenreaders: what follows is a regex that matches on a bunch of happy emoji faces, connected with the "or" operator "|")

/😁|😀|😂|🤣|😃|😄|😅|😆|😊|🥰|☺|🙂/gm

Kalamata Hari
2 months ago

@randomgeek The issue isn't greediness, though. A non-greedy #regex will go through the same motions, just in a different order. The trouble is caused by a backtracking regex engine trying to deal with overlapping matches (like the two adjacent .* parts, but also .*=.* because . can match =). This part is O(n3), I think?

૮༼⚆︿⚆༽つ
2 months ago

TODO: compare compressed bundle size (brotli) between:

* parser written using either PEG or Parser Combinator in Rust (or whatever) which compile to #wasm

vs

* RegEx in JavaScript (the #regex can be written and generated using pomsky or whatever)
https://pomsky-lang.org

Mark Gardner ‍:sdf:
2 months ago

@scruss Executing #Perl in a #regex substitution is a cursed pattern

perl -pwle 's|(?<=,)(\d+)(?=>)|sprintf("%.0f", 440*2**(($1-34)/12))|eg;'

... perfectly normal regex skillz, no?

#perl #regex

sofia
2 months ago

#poll for people doing programming/coding:

how useful do you think #regEx is?

My tip for #rstats user new to the game. Take your time and browse through the functions available in the stats, base and utils packages that come with each #rstats installation. Just look at names that sound 'odd'.
You will find some things that are otherwise promoted by auxiliary packages or for things you start to code by yourself.
My example is trimws() from base to remove leading/trailing whitespace. I have seen rather complicated proposals over at #stackoverflow with #regex and alike.

Screenshot of #rstats stats functions shown in #RKWard
Help page of the trimws() function of the base #rstats package, shown in #RKWard
Habsburg AI
2 months ago

We have found a deeply cursed #regex https://regex101.com/library/tA9pM8

Ewan :apple_inc:
2 months ago

not me learning #Regex through #Firefish word filters 😅

christophe
2 months ago

Hello, could someone tell me which #regex is used by #browsers to validate email format (in <input type=email>)?

eklem
3 months ago

For the good of the people, the JavaScript module words-n-numbers, version 9.1.1 can now extract all those special unicode emojis.
https://eklem.github.io/words-n-numbers/demo/

#regex #JavaScript #browser #nodejs #unicode #emoji #emojis

Demo of emoji-extraction in browser. Text input: "🧑🏽‍🤝‍🧑🏾 people holding hands: medium skin tone, medium-dark skin tone
👩🏻‍🤝‍👩🏿 women holding hands: light skin tone, dark skin tone
👩🏼‍🤝‍👨🏾 woman and man holding hands: medium-light skin tone, medium-dark skin tone
👨🏼‍🤝‍👨🏾 men holding hands: medium-light skin tone, medium-dark skin tone"

Output: [ "🧑🏽‍🤝‍🧑🏾", "👩🏻‍🤝‍👩🏿", "👩🏼‍🤝‍👨🏾", "👨🏼‍🤝‍👨🏾" ]

@timbray I wonder how close the match is to #Hyperscan. That’s Intel’s high-perf regex engine, which disallows backreferences, lookarounds and capture groups, much like I-Regex.

https://www.intel.com/content/www/us/en/developer/articles/technical/introduction-to-hyperscan.html

If you want more fodder for considering which regexes would work in your spec, they have a corpus of thousands available.

Hyperscan does allow `.`, `\s`, `\d` and so on, so you’re even stricter!

#regex

#regex #protip :
If your instance uses #glitchsoc , your Live Feeds tab will allow you to filter out posts based on a regular expression. But what if you want to filter out posts that don't contain a certain word (i.e. searching for a specific term)? You can use this regex:

^((?!term)[\s\S])*$

Explanation: https://stackoverflow.com/a/406408

#wardiPublicPost #search #filter

Ivan Enderlin 🦀
3 months ago

Hyperscan, https://github.com/intel/hyperscan.

> Hyperscan is a high-performance multiple regex matching library. It follows the regular expression syntax of the commonly-used libpcre library, but is a standalone library with its own C API.
>
> Hyperscan uses hybrid automata techniques to allow simultaneous matching of large numbers (up to tens of thousands) of regular expressions and for the matching of regular expressions across streams of data.

#regex #engjne #performance

Regular expression hexagonal crossword puzzle.

#regex #programming

A hexagonal grid with regular expressions on the outside of eat rank, file, and diagonal as clues.
Ivan Enderlin 🦀
3 months ago

Regex engine internals as a library, https://blog.burntsushi.net/regex-internals/.

An incredible and must-read blog post explaining the internals of the `regex` Rust crate. How it has moved from a “monolithic” to a “multi-library” project. It explains in details the problems regex engines have to deal with, the importance of literal optimisations, the NFA data type, and the various regex engines that are implemented (incl. a meta engine, to rule them all).

It’s now 1.5 times faster on average 👌.

#RustLang #regex

Ernir Þorsteinsson
3 months ago

Was writing a #regex. Thought this was a perfect time to use one of those newfangled #AI things (#ChatGPT), this kind of well specified problem requiring boring arcane syntax should be right up its alley

It spat it out near instantly, the running example showing exactly what I would expect

Only, the actual regex created and shown in the example isn't even syntactically valid

David Colarusso
3 months ago

Fellow #LawFedi folks working on quantitative stuff! Perhaps you occasionally need to extract data from a voluminous dataset. Perhaps you've been thinking, "I wonder if a #RegEx-#LLM hybrid approach would speed things up?" FWIW, my tentative answer is, "yes."

Here's a sample notebook for extracting data from OCRed PDFs using RegEx and LLMs: https://github.com/colarusso/entity_extraction/blob/main/PDF%20Entity%20Extraction%20with%20Regex%20and%20LLMs.ipynb

Enjoy! Now for some thoughts on its use in scholarship. See next item in thread.👇

Thomas Rigby
4 months ago

I wrote a little #blog post about #regex because it's hard!

https://thomasrigby.com/posts/regular-expressions-are-hard/

Thomas Rigby
4 months ago

I wrote a little #blog post about #regex because it's hard!

https://thomasrigby.com/posts/regular-expressions-are-hard/

Nils Goroll
4 months ago

regsub() on bodies has finally arrived for #opensource Varnish HTTP Cache.

Our #pcre2 #regex module https://gitlab.com/uplex/varnish/libvmod-re for #varnishcache now also supports substitutions on bodies. Similar to the recently announced .match_body() method, this feature supports matches across storage segments while avoiding to make copies using PCRE2's partial match feature.

Another big thank you to Philip Hazel and Zoltan Herczeg for their great work on the essential regular expression library.

Nils Goroll
4 months ago

A bugfix gives me an excuse to mention that, for some time now, vmod_re https://gitlab.com/uplex/varnish/libvmod-re - our #pcre2 #regex module for #varnishcache - also supports matches against bodies.
The implementation supports matches across storage segments while avoiding to make copies using PCRE2's partial match feature.
A big think you to Philip Hazel and Zoltan Herczeg for their great work on the essential regular expression library. @slimhazard

Joe Lanman
4 months ago

one of my favourite online #puzzles - Regex Golf
https://alf.nu/RegexGolf
#regex

I like how #calckey tells you which of your word #mutes caused a post to be #muted and blurs it out, but when that was a regex it would be cool if instead of just displaying the #regex it could also display the matched text, or the capture groups.

There's this thing people do where they append "bro" on the end of any word to turn it into a pejorative to make generalizations about people based on their gender which is a shitty thing to do.

Here's a
#regex to mute it in #calckey:

/\b\w*bro(s)?\b/

#discrimination #sexism #rude #mute #men

The cool thing about #regex support in #calckey's filters is that you don't have to filter for all the plural forms of words; /\bgun/ matches "gun" and "guns"!

Josh Bruce
5 months ago

#Regex isn't my normal deal so any assistance would be greatly appreciated. See if there's someone who can beat me to it. lol

{!! date:
created=2023-01-01
!!}

I want to match everything between the opening and closing.

Inautilo
5 months ago

#Development #Tools
A list of programming playgrounds · A community-sourced collection of playgrounds for web developers https://ilo.im/12dmwi

_____
#Playground #WebDevelopment #WebDev #Frontend #Backend #HTML #SVG #CSS #JavaScript #TypeScript #JSON #RegEx #Unicode #Git #PHP #Python #Ruby #Rust #SQL #DNS

#chatGPT is #op for writing #regex and I *know* how to write regex!

All those braincells I can finally put to better use now!!

Piet van Zoen
6 months ago

Just started using #ChatGPT and #GithubCopilot for a personal project and I'm pretty impressed. They've been super helpful with writing tests and figuring out complex #SQL queries and #Regex.

Also, I accidentally discovered that Copilot can help write commit messages. Using a commit message template with a prompt comment and running git commit --verbose, Copilot does a great job of reading the diff and figuring out the appropriate commit message.