#CSV
I found it! In the wild! Actual productive data! A #CSV-field with a line-break!
I thought it was the sort of thing that only exists in the specification, but no, there it is, I actually encountered one in real-world data!
And thus, I am proven correct once more: Don't use split, don't use regex, don't iterate over lines; use a proper CSV #parser!
And for those of you who are a bit confused right now: Yes, quoted CSV fields can totally contain newline characters.
#iconv & #csvq to the rescue!
Damit lässt sich wenigstens ein Teil der Kaputtheit umgehen.
Es ist trotzdem frustrierend, dass es an so basalen Dingen bei einer derart riesigen Organisation wie der #DSGV scheitert und man um deren Unzulänglichkeiten herum arbeiten muss.
Immerhin kann ich jetzt mal wieder meine csvq-Skills auffrischen.
Ein absolut geniales Tool, mit welchem sich ein CSV/TSV/JSON-File wie eine SQL-DB abfragen & bearbeiten lässt:
https://mithrandie.github.io/csvq/
Ich versuche gerade einen Datenexport eines #Kreissparkasse|n-Kontos in @ff3 zu importieren.
Boaaah ist das bei der #Sparkasse alles kaputt!
- das #CSV ist ISO8859-1
- im Verwendungszweck tauchen zufällige Leerzeichen mitten in Wörtern auf
- Für "BIC (SWIFT-Code)" wird bei bankinternen Umbuchungen die BLZ verwendet
- es gibt keine ordentliche API
Der gesamte #Banking-Sektor gehört einmal mit der Planierraupe bearbeitet und anschließend von Menschen mit technischem Sachverstand neu aufgebaut.
^^ Ech, de #Luc, de nationale Spëtzekandidat, brauch deng Hëllef, léiwe Frënd Jean-Claude. #LW23
#Satir #Walen2023 #CSV
𝐃𝐞 𝐖𝐨𝐥𝐥𝐞𝐟 𝐚𝐦 𝐒𝐜𝐡𝐨𝐟𝐬𝐩𝐞𝐥𝐳
E jonke Mënsch huet sech d'Méi gemaach, dem #Luc seng politesch Carrière opzeschaffen. De #Luc ass doriwwer guer net frou a jéimert. Liest, wat hien sengem Frënd Jean-Claude schreift.
https://www.luxembourgjungle.lu/app/flex/blog/preview/516693532
The original #awk just recently merged native #CSV support. Apparently it handles UTF-8 & emojis just fine also: https://github.com/onetrueawk/awk/commit/0a497bc5a1ad26e21a0e5018e884c7435b112c53#diff-a5be69a63e2dbcc302b4ecdf5dae941356d6cfa436b69bd14d587a6494675113R42
#sqlite vs #json vs #csv
What If OpenDocument Used SQLite?
🔗https://www.sqlite.org/affcase1.html 🔗 🟠 https://news.ycombinator.com/item?id=37553574
"file over app"
🔗https://twitter.com/GalaxyKate/status/1703514991679128038
PowerShell 7: disattivazione di un servizio Microsoft 365 via Graph
https://gioxx.org/2023/09/18/powershell-7-disattivazione-di-un-servizio-microsoft-365-via-graph/
#Lavoro #AccountUtente #CSV #InEvidenza #Lavoro #Microsoft #Microsoft365 #MicrosoftExchange #MicrosoftGraph #MicrosoftOffice #PowerShell #PowerShellPerMicrosoft365 #Skype
How to import a .csv file to the ubuntu terminal? #commandline #csv #upload
how to import csv file to ubuntu terminal? #commandline #csv #upload
Fantasy, Retire the CSV, B-Trees, and Modern LZ Compression
#book #fantasy #movie #csv #database #ComputerScience #compression #cpp
👉 Please retweet if you ❤ Plurrrr. Thanks! 👍
Trdsql – Pour interroger des fichiers plats (CSV, JSON…etc) avec SQL
https://korben.info/trdsql-outil-puissant-interroger-fichiers-plats-sql.html
Sous le coude.
Qsv – Un outil puissant pour gérer vos fichiers CSV facilement
https://korben.info/qsv-outil-puissant-gerer-fichiers-csv-facilement.html
awkが新しくなる!? 本家 Awk が UTF-8 と CSV 対応に!
https://qiita.com/ko1nksm/items/1a3e711bbd925657f5fd?utm_campaign=popular_items&utm_medium=feed&utm_source=popular_items
#qiita #ShellScript #AWK #UNIX #CSV #シェル芸
Quick R Quiz - Importing CSV Files!
Which of the following functions is used to import CSV data in R?
#rstats #codingquiz #codingchallenge #datascience #csv #excel

Oh noes, apparently I haven't considered different delimiters for the left and right #CSV in #CsvDiff.😱
Someone reported a bug in `qsv diff` (which uses csv-diff) with this scenario.
https://github.com/jqnatividad/qsv/issues/1258
I'll have a look at it tomorrow.
Glad, csv-diff is actively used! ❤️
#Sondaggi #Lussemburgo
Sondaggio di TNS Ilres:
"Chi preferiresti come Primo Ministro?"
Xavier #Bettel (#DP|RE): 34% (+4)
Luc #Frieden (#CSV|EPP): 21% (+1)
Paulette #Lenert (#LSAP|S&D): 20% (-3)
Sam #Tanson (#DG|G/EFA): 5% (-1)
Nessuno di questi/Non so: 12% (-9)
Non mi interessa: 9% (+9)
Data rilevazione: 7-16 agosto
+/-: 23 marzo-6 aprile
Intervistati: 1887
#Sondaggi #Lussemburgo
Sondaggio di TNS Ilres sui seggi:
#CSV|EPP: 19 seggi (+2)
#LSAP|S&D: 13 (+1)
#DP|RE: 11
#DG|G/EFA: 7 (-1)
#Piraten|G/EFA: 5 (-1)
#ADR|ECR: 3 (-1)
#DL|LEFT: 2
#Fokus|Centro pragmatico: 0
#KPL|Estrema sinistra: 0
#Volt|G/EFA: 0
Totale seggi: 60
Maggioranza: 31
Data rilevazione: 7-16 agosto
+/-: 23 marzo-6 aprile
Intervistati: 1887
#Sondaggi #Lussemburgo
Sondaggio di TNS Ilres:
#CSV|EPP: 28% (+1)
#LSAP|S&D: 20% (+2)
#DP|RE: 17%
#DG|G/EFA: 11% (-2)
#Piraten|G/EFA: 10%
#ADR|ECR: 7% (-0,5)
#DL|LEFT: 5% (+1)
#Fokus|Centro pragmatico: 1% (-1)
#KPL|Estrema sinistra: 0,4% (-0,1)
#Volt|G/EFA: 0,3% (-0,1)
Data rilevazione: 7-16 agosto
+/-: 23 marzo-6 aprile
Intervistati: 1887
I've been working on a #json to #csv converter.
There are several options just a Google away, but they often freeze on large files or break on complicated schema. And then there's the trust factor.
The requirements:
- Works with large and complicated JSON files
- Open source, so you can trust that it's not saving your data nefariously
Le problème du jour est simple :
J'ai un fichier csv qui contient une colonne de dates au format jj/mm/yyyy.
J'ai besoin de les formatter en yyy-mm-jj afin de les faire passer dans un script qui utilise ces dates pour générer un fichier .ics.
Ça marchait très bien avant mais j'ai oublier de documenter comment je formatait les dates 😕
[#GOOGLE|#CSV] Nouveauté Google : les fichiers CSV sont indexables ! (par Abondance)
Mais, pour l'instant, filetype:CSV ou ext:CSV ne fournissent pas de résultats.
https://www.abondance.com/20230828-141117-nouveaute-google-fichiers-csv-indexables.html

#HowToThing #008 — CSV parsing & filtering into structured data via https://thi.ng/csv and creating a multi-plot data visualization via https://thi.ng/viz (along with a range of other helpful packages for various side aspects).
The attached visualization shows a lin-log plot of new COVID cases between March 2020 - Dec 2021:
- Daily world total as line plot
- UK (red) and USA (blue) cases as interleaved bar plots
(All data from: https://ourworldindata.org/coronavirus)
Full source code:
https://gist.github.com/postspectacular/6a379a2bb8cd46e242163b9c9563522f
#ThingUmbrella #Transducers #TypeScript #JavaScript #DataViz #CSV #SVG #Tutorial
"Crosswalker is a general purpose tool for joining columns of text data that don't perfectly match."
"The tool auto-ranks matches for each data row ...
The tool auto-matches values that are practically identical
The results are presented in an interactive spreadsheet from which you can manually continue matching
As you go, the columns are resorted to highlight the most probable remaining matches"
https://github.com/washingtonpost/crosswalker
see other cool things: @dylan
https://github.com/freedmand
Interesting.
I mean. The link when finalized *is* correct, as seen here:
https://ry3yr.github.io/OSTR/Diarykeepers_Homepage/dekugames.html
The collection.json works as you'd expect
But it seemed #weird at first glance.
How does #csv handle this ?
#fediverse's stats
と言うユーザーから、 #Fediverse の状況が送られてくるが、以前は探索結果を #csv の形で流していたが、6/18を最後に流さない。
何故?また流してほしい。自分がやっている探索の整合性を見たいので。(結構勝手なことを言ってます)
#CSV chérie, hat der och Problemer är Lëschte voll ze kreien? Eng Kandidatin vu Manternach am Zentrum.
New #OpenStreetMap tool from me: `osmchangesets2csv`.
Turn the #OSM changesets dump file (which is XML) into an easy to use CSV file.
🔗 <https://github.com/amandasaurus/osmchangesets2csv>
#csv #cli #unix
I was so annoyed #Amazon killed its #CSV downloads that I decided to replace it with a #Coda workflow. Then I went NEXT LEVEL and made my first #firefox extension —it pulls Amazon invoice info and saves it to Coda. Then I added a whole workflow for #Canadians who are doing #crossborder shopping.
I’d welcome feedback, testing etc. <All the disclaimers> because thanks to #GPT this is way more complex than anything I’ve ever coded. https://coda.io/@awsamuel/amazon-to-csv
Documentation on RTD is also up to date for xtab.py, un-xtab.py, tklayout.py, tkpane.py, execsql_upsert, execsql_glossary, and chkcsv.py.
The execsql_upsert and execsql_glossary scripts are hosted only on OSDN, so they are currently not accessible for download. If the problem with OSDN continues much longer, I will switch to a different hosting service.
2/2
Aus aktuellem Anlass gibt es einen Blogbeitrag dazu, wie man mit #QGIS aus #CSV-Dateien mit #geographischen Angaben eine Karte erstellt.
Recommended #opensource #file #duplicates detection and deletion: #rmlint
Why? - Extremely fast · #CLI · Candidate file filtering by #name, #size, #modification #time · Configurable criteria for determining the original file · Paranoia mode offered (byte-by-byte comparison) · Flexible output #formats, including #bash deletion script, #json, #CSV · Excellent #documentation and #tutorials
https://github.com/sahib/rmlint
More recommendations: https://tuxwise.net/recommended-software/
I have cooked up a little Firefox extension + Coda.io template to replace the defunct #amazon #CSV downloader. I’d love to find some folks to kick the tires.
But be warned that if you are a #canadian who loves #spreadsheets and does crossborder runs to pick up Amazon packages in the United States, this little workflow might make you fall in love with me, and I am already taken.
^^ E puer Schréipsen am Lack
💥🆕🔨 I threw together a simple tool for hammering arbitrary JSON fields into nice regular CSV records...
➡️ https://github.com/instantiator/json-to-smart-csv
I'll bet there are other tools out there that do this. For anything complex you're probably better off writing your own script. Hopefully this tool is helpful in simple cases where you want to quickly convert a bunch of data and don't want to have to write code to do it...
Ahora que #BEAR 2.0 soporta #TABLAS en la siguiente liga encontrarás una herramienta en línea que te ayudará a convertir tu #CSV en #MarkDown:
https://www.convertcsv.com/csv-to-markdown.htm
El resultado, simplemente lo copias y lo pegas en tu nota de Bear.
Dear frontend community, I’ve created an awesome list for frontend watch and included a CSV file for Mastodon, LinkedIn, and Twitter accounts. If you’d like to have your account added to this list, please respond to this toot. :blobfoxflooftea:
https://github.com/Axolotat/awesome-frontend-watch
Could you please #retoot this post to help me get more visibility?
Have a great day! :revblobfox:
How could I miss out on #VisiData for so long? This might become my new favorite #CLI tool.
If you do _anything_ with data and enjoy working in the terminal, check it out. It can
• provide a #TUI for viewing and editing data in #CSV, #Excel, #SQLite, #JSON, #YAML & #XML files and quite a few more
• sort, filter, join and edit that data, across files and across formats
• convert between the formats (interactively or not)
• record & play macros
• be scripted in #Python
@MrHedmad If I interpret your problem correctly, you probably don't want to use Parquet or Arrow for this (as others have suggested), because your operation is fundamentaly row-based and not column-based.
You can use `csv` crate in a "streaming" fashion by reusing your `StringRecord` in every iteration of row parsing:
https://docs.rs/csv/latest/csv/struct.Reader.html#method.read_record
Something like this:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=3f6ba982764f8e97a20bf718d4546f25
It will basically only require memory for one row at a time.
1/2
🎉🆕 blog post! A Netflixstory
How to make sense of your Netflix viewing data.
➡️ https://instantiator.dev/post/netflix-history/
#netflix #tv #watch #data #visualisation #csv #jupyter #python #pandas
Da es ja keine #CSV mehr bei der #Postbank gibt, habe ich ein ganz übles Programm zusammengefuddelt, was aus einem PDF Kontoauszug nach OCR unter Einsatz rohster Gewalt ein CSV erdichtet und sich dabei nur an Trennzeichen im Datum, Leerzeichen, LF, CR sowie + und - beim Betrag orientieren kann und dabei abschätzt, ob der einzelne Datensatz komplett sein könnte.
Watsefack. Was ein Dreck. NoWay!
This is pointless. Sorry, but it is.
#CSV is a digital text format. It's meant to be readable by human eyeballs, as well as machines.
Building a self-describing binary format as a replacement for CSV defeats the purpose of CSV. Finally, you shouldn't be using CSV for 10 million rows of data. Use a proper database. Even better, use an existing binary, self-describing format like NetCDF or HDF based on _decades_ of development and experience.
I'm curious what the defacto best practice is for converting two different record classes to CSV? Should the first column be a 'type' column? Should the record types share columns with common names, or should record type A occupy columns 1..N and record type B occupy columns N..M?
#csv
New release of #qsv, the CSV toolkit, is out! 🎉
The `diff` command now sorts by line when no other sort option is given (before, order of diffresult was not stable across runs). 🧮 📃
This release also introduces a new command `joinp` - the first command that is powered by pola.rs! 🚀
Check the full release notes here:
https://github.com/jqnatividad/qsv/releases/tag/0.90.0
#CSV #CLI #Terminal #Rust #RustLang #OpenSource #Data #DataScience #Polars
I just published a new blog post on building a CSV file using Python for analyzing code files in a software engineering codebase.
A new version of csv-diff is out (v0.1.0-beta.2) 🎉
https://lib.rs/crates/csv-diff
This version adds a method, which allows you to sort your diff result by columns (it was already possible to sort by lines).
See the changelog for an example:
https://gitlab.com/janriemer/csv-diff/-/blob/8642a8a7ba14e22d076cee8c3f690c17f41d7528/CHANGELOG.md#010-beta2-19-february-2023
Sorting by columns will soon be integrated into qsv, the #CSV toolkit:
https://github.com/jqnatividad/qsv/issues/714
Thank you @jqnatividad for the idea of this feature! 💚
Yesterday I saw a nice stream by @starbuxman about Spring Batch and analyzing a video game sales data set and I thought: Can we do with less software? We can.
Analyze the "vgsales.csv" data set with sqlite3 and curl alone:
https://gist.github.com/michael-simons/c2a8e639e540ce928e968d5c1ab8e181
Description of the dataset here https://www.kaggle.com/datasets/gregorut/videogamesales
@bobwyman @hrheingold @blinry @datasniff is simple to use:
1. Install from store
2. Visit a hyperlink that denotes an #HTML, #JSON, #JSONLD, other #RDF doc types (e.g., #RDFTurtle, #RDFXML), #CSV, #RSS, or #atom
3. Click on "doggie" icon and it will sniff out the structured data and present property-sheet based UI
Once the #StructuredData is sniffed out, you can download to your filesystem, #WedDAV or #LDP compliant #DataSpace (or Solid Pod), or a #DBMS (or store) that supports #SPARQL.
I just now finally got around to migrating my #follows, #followers, and #lists from #Twitter to #Mastodon. It was super easy, and the feeling of #liberation is palpable.
I used #Debirdify (https://debirdify.pruvisto.org/), and I highly recommend it. You simply login on Debirdify with your #birdsite account, select which lists of people you want to migrate, export them as one or more #CSV files, and then import those files in your Mastodon preferences under "Import and export."
Note: The #API that Debirdify uses to export your data for you will be shut off on Thursday, February 9th, and #SpaceKaren might cut it off even sooner than that if he feels like it, so if you've been #procrastinating on this (like I was), get on it immediately!
@bahnkundenv Allein die Tatsache dass man hier auf #Excel [#OOXML] anstelle von offenen Lösungen und Standard wie #OpenDocument oder #CSV / #TSV setzt sagt alles über den Zustand der #DB aus...
@noamross
I was a bit slow to realize, but I think tad by @antony might just check all boxes.
https://www.tadviewer.com/
https://github.com/antonycourtney/tad/
#csv, #parquet, #sqlite, #duckdb, #GUI
@alcinnz Wow, this is so cool seeing #Rust being integrated into more and more tools!
This announcement actually inspired me to create an issue in csv-diff, the fastest CSV diffing library in Rust 🚀 , to evaluate creating a csv diffing extension for sqlite.
Not a priority for now, but maybe someone is willing to help?
https://gitlab.com/janriemer/csv-diff/-/issues/12
Thank you for sharing the article. ❤️
@hyde Also check out `qsv`. 🙂
It's an actively maintained fork of xsv (xsv is not maintained anymore).
qsv is _very active_ in development.
And shameless plug in the end 😁
Just a few days ago, `csv-diff` got merged:
https://github.com/jqnatividad/qsv/pull/711
csv-diff is a crate for comparing CSVs with ludicrous speed:
https://gitlab.com/janriemer/csv-diff
So the new command `qsv diff` is now the fastest #CSV differ in the world! 🚀
What is this demonstrating, utility wise?
The fact that I can use a #SmartAgent like #gptChat to generate machine-computable structured data for upload to a knowledgebase (or #KnowledgeGraph).
Naturally, I can query said Knowledge Graph declaratively using #SQL, #SPARQL, or #GraphQL where query solutions also manifest in easy-to-reuse form e.g., #JSON, #CSV, etc..
We are now in the Smart Agent stage re notion of a #SemanticWeb!
@ademalsasa You're welcome - glad you like it! ❤️
Hmmm... 🤔
Maybe have a look at "Time Till Open Source Alternative" list by @staltz
https://staltz.com/time-till-open-source-alternative.html
It lists open source alternatives to proprietary software (along with the duration it took until that OSS alternative emerged).
Raw data as #CSV can be found on GitHub:
https://github.com/staltz/ttosa
Also, thank _you_ for your regular lists of #FLOSS software! ❤️
Once data exists in structured form its reusability increases immensely.
For instance, I can construct a #CSV #URL scoped to my #GoogleSpreadheet (https://docs.google.com/spreadsheets/d/18Pi1AeQezbTdjjPcb6ol0Rxwx-hq5JkA4RoPsTapPqw/gviz/tq?tqx=out:csv&range=A1:AA729&sheet=ConsolidatedFollows) that functions as a Data Source Name (#DSN) for automated conversion into a #KnowledgeGraph using terms defined in a variety of vocabularies.
Here's a #SPARQL Query against said Knowledge Graph.
Just click to explore.
@mastodonmigration @tchambers @RyanGerbosi @cmonstah @DeanObeidallah You can go through the doc and add just the journalists you like, or of you want to follow all 1000+ journos on the list, @markhenick made a #CSV file that you can import straight into Mastodon. You can get that file, and instructions on how to use it, here:
More journalists join the list every day, so you might want to re-import it periodically.
Want more than journalists? Here is an auto-updating list of famous people on Mastodon. No #Kardashians yet, but it's got everyone from @neilhimself to @georgetakei
And here is a great list of #academics from all fields:
https://github.com/nathanlesage/academics-on-mastodon
And lastly, a huge list of #resources that can help you figure out Mastodon:
https://researchbuzz.me/2022/11/05/a-big-list-of-mastodon-resources/
EddoiJZvhNM2L9PGBEfOhUFbkPuYYVJ4HACoYfHAhGERfpi0g3r0i/pubhtml
Announcing Exodus, a #GitHub Action that helps Twitter #communities find members on #Mastodon.
Searches lists, hashtags, account followers & more for Mastodon addresses in name, bio or pinned tweet, then exports the results to a #CSV file.
Wondering if anybody has been working on a project/tool to #convert your #twitter #archive to a format suited for long term storage, perhaps using something like #sqlite, #csv etc., or even convert to a museum* format you could #host on your own site for historical/self-contained-archive purposes.
Pinging @simon here as it could be an interesting project/too idea to be implemented with @datasette ! And perhaps, somebody is already working on it as we speak.
OK, ka med en søndags #sameintro #introduksjon?
Hei, æ hete Siri! Æ bor i #MoiRana og jobbe med Samisk bibliografi på #Nasjonalbiblioteket, så æ nerde tidvis om #samisklitteratur, og dele #ukastilvekst sånn cirka hver fredag. Æ e singel og barnløs og siessá[/tante] til min lille siessál og minisiessálan.
Interessert i #teater, #kunst, og generell finkultur, elske å henge på #prïhtjegåetie (les: #kaffebarer). Født og oppvokst i #Tromsø, men tilhøre Tana i hjertet.
Og æ e #ČSV alltid :sami3: