#Rstats
CRAN updates: maptools #rstats
📦 geofacet
📝 'ggplot2' Faceting Utilities for Geographical Data
🔗 https://cran.r-project.org/web/packages/geofacet/index.html
CRAN updates: ucminf #rstats
CRAN updates: echarty gdalraster ggdag httptest lingtypology nanonext #rstats
New CRAN package crosstable with initial version 0.6.2
#rstats
https://cran.r-project.org/package=crosstable
Has anyone ever figured out the CSS to change a {pkgdown} site version number? I cannot find the incantations.
@rfortunes @koantig I enjoyed them while they lasted. Sorry to see it go. Maybe the #rstats fortunes package needs to be updated to remove what didn’t age well and add some newer more relevant quotes.
CRAN updates: hpfilter #rstats
2018 Central Park Squirrel Census for this week's #TidyTuesday
code: https://github.com/gkaramanis/tidytuesday/tree/master/2023/2023-week_21

New version of #ggnewscale now fixes a long-standing bug that was introduced a few versions ago and was really hard to debug. Now you can actually add more than 2 scales if the scale is implicit in the geom/stat.

Trying out Meld, a simple but incredibly useful tool for tracking changes in code side-by-side…
App: https://meld.app/
Homebrew installation on OSX: https://formulae.brew.sh/cask/meld

Released a minor update {collapse} v1.9.6, which, notably, includes a new vignette on how {collapse} handles R objects - a quick view behind the scenes of its class-agnostic R programming framework: https://sebkrantz.github.io/collapse/articles/collapse_object_handling.html #rcollapse #rstats
My talk for @ShinyConf #ShinyConf2023 "Baking JavaScript into a #RShiny Package" is up on the @appsilon YouTube! https://youtu.be/QLvd-q3c0Xc
Check it out, along with my {cookies} #RStats package! https://r4ds.github.io/cookies/
This looks intriguing. Hopefully there's a grammar available for #rstats
#TidyTuesday week 21: Squirrels 🐿️
Dabbled in some #GenAI this week. The tree and all squirrels were generated using Dalle. Not perfect but kinda cool.
Bit iffy on what the actual data represents though
🔗 http://github.com/doehm/tidytues
#Rstats #dataviz #r4ds #ggplot2

CRAN updates: dogesr #rstats
CRAN updates: collapse ergm sentopics #rstats
R doesnt need to be a hard and scientific tool 📈. You can use it to make art 🎨: https://github.com/cutterkom/generativeart #rstats #generativeart #art @cutterkom
CRAN updates: RcmdrPlugin.WorldFlora #rstats
CRAN removals: censusxy dynamite eiCompare glmnetr #rstats
CRAN updates: augmentedRCBD #rstats
Week of May 15-21 of #MapPromptMonday, Proportional Symbols
Code: https://github.com/gkaramanis/mappromptmonday/tree/master/2023/2023-week_20

CRAN updates: ODRF #rstats
An R veteran of over 8 years and a total chump:
I've only just learned to use `dev.new()` to S T O P that annoying 'Plots' pane that barely ever functions correctly and sometimes just straight errors complex plots on my tiny laptop screen. Hello, `dev.new()`.
{valve} now has
- auto-connection pooling
- auto-termination
So thats auto-scaling your plumber API.
:rstat: 🦀
This is such a fun project and im learning so so so much.

📦 ggroups
📝 Pedigree and Genetic Groups
🔗 https://cran.r-project.org/web/packages/ggroups/index.html
CRAN updates: babelmixr2 etwfe FIESTAutils measr treediff #rstats
CRAN updates: photobiology #rstats
The other day I saw a guy at the gym taking a picture of his sweat all over the floor, probably to share on on Instagram or whatever. At first I was judgy, thinking "why would you share something so dumb". But then I realise that when I write #rstats code that I'm proud of, I like sharing it with the community and I guess it must be similar for him and the workout scene, so I decided I was the asshole for judging him.
CRAN updates: mzipmed SSBtools #rstats
In the fall, I'm doing an independent study on data exploration in #rstats with one of my former students. I'm primarily planning on using the #tidyverse, but any other packages out there that might be useful to play around with? i'd rather have too many ideas than not enough.
Minor release for my ntdr #rstats package. Mostly internal changes and some documentation updates to reflect the new terminology being used in the #NationalTransitDatabase https://vgxhc.github.io/ntdr/index.html
It'll probably be a long while until I wrap my arms around #MicrosoftFabric. I'm not expecting us on-prem (or hybrid) folks to find much joy in it.
I read an article about Data Lake parquet file optimization that was interesting. It's kinda cool tbh. But my expectations haven't budged yet.
Frankly, I'm still salty that #Microsoft won't give us column names with sp_execute_external_script.
The making of this week's #TidyTuesday recorded with {camcorder} in #RStats! 📹📷
YouTube: https://youtu.be/vobupMY8RpU

Add highlighting to your quarto presentation using the RoughNotation library: https://emilhvitfeldt.github.io/quarto-roughnotation/ #rstats #quarto #slides
I've said it before, but I'll keep saying it. THANK YOU to @sckottie and @maelle for all your work to make API packages in R possible. This is a gem of a resource, https://books.ropensci.org/http-testing/index.html and {vcr} is such a great tool to use. 🙏👏 #RStats
🗃️ Using walk() to write many files, feat. file-system navigation with {fs}:
🚶 "purrr::walk()` this way" https://www.tidyverse.org/blog/2023/05/purrr-walk-this-way/ #RStats

Striking drop in annual growth of GDP per capita from 2019 to 2020 💸
A series of #dataviz|es as alternatives to two choropleth maps, comparing the trends per year as shared by Max Roser (OurWorldInData).
3️⃣ Dumbbell graph showing trends for countries that inhabit ~20M people
(plus a long version with all countries with a noticeable change in the replies)

What if we have forest cover over shaded relief? We get this map of El Salvador!
#rayshader adventures, an #rstats tale

🚰 Re-direct R-code output to files w/ sink():
📝 “What is the sink() function? Capturing Output to External Files” by Steven P. Sanderson
https://www.spsanderson.com/steveondata/posts/2023-05-23/ #RStats
A map showing the forest cover of Guatemala using data from 2019.
#rayshader adventures, an #rstats tale

🚰 :rstat: :🔧 {valve} is ready for some test users!
Parallelize your {plumber} APIs with literally one function.
Any brave souls willing to give it a shot? (particularly windows users)
👀 read the README to understand why this is so powerful!

My latest podcast interview on forecasting software with Fede Garza and Eric Stellwagen https://forecastingimpact.buzzsprout.com/1641538/12809499-forecasting-software-panel #rstats #forecasting
Today @isabelizimm and I have a new guide published on how to use vetiver 🏺 to store model metrics as ✨metadata✨, in either #rstats or #python:
https://vetiver.rstudio.com/learn-more/metrics-metadata.html
Was checking student code—for a bunch of data manipulation, matrixy stuff—and at the beginning started with:
library(tidyverse)
but only used %>% and filter(). I pointed out that he could get away with:
library(dplyr)
but, even easier, could use:
|> and subset(), for no tidyverse dependency whatsoever.
The code has other 3 dependencies (pedigree, asreml and rrBLUP), but why use more than you really need?
Back to forest cover maps with one of México.
#rayshader adventures, an #rstats tale

Starting with the mechanics of understaning how LLMs work can help foster durable intuitions that will inform our usage of these models now & in the future. (Especially if the future is one where LLMs are a staple of the data scientist’s toolbox, as common as an lm() function call).
And what better way is there to learn than by doing!
Tomasz Kalinowski walks through an implementation of LLaMA, a Large Language Model, in R, with TensorFlow and Keras.
https://blogs.rstudio.com/ai/posts/2023-05-25-llama-tensorflow-keras/
@will I think @rOpenSci does a pretty good job with this. https://contributing.ropensci.org/ . #rstats
#TidyTuesday week 21 - Central Park Squirrels. Made an infographic this time with the help of #canva and R.
#dataviz #rstats

I asked Bing Chat the same question. Answer: dplyr, tidyr, stringr, lubridate, reshape2, data.table, magrittr, and purrr, each with a description, ex:
dplyr: "This package is used for data manipulation and is one of the most popular packages in R. It provides a set of functions that can be used to filter, arrange, group, and summarize data." and data.table: "This package is used for fast data manipulation and is especially useful for large datasets." for data.table
2/2
#rstats #AI #GenerativeAI

I just asked Google Bard: What are the best R packages for wrangling data?
Answer: dplyr, tidyr, data.table, reshape2, and stringr, each with 📦 details. Plus a comment that there are others and you can search CRAN to find more. 1/2

The {ggautomap} #rstats 📦 “provides #ggplot2 geometries that make use of cartographer, a framework for matching place names with map data. With ggautomap your input dataset doesn’t need to be spatially aware: The geometries will automatically attach the map data (providing it’s been registered with cartographer).”
https://cidm-ph.github.io/ggautomap/index.html
By Carl Suster, on CRAN
#RSpatial #GIS @rstats

find.not.numeric.value() is a function that finds where in a vector there are values that can't be converted to numbers. For example
find.not.numeric.value(myvector)
Part of the {infun} #rstats 📦 containing several R utility functions. uBy Mao Kobayashi
https://github.com/indenkun/infun
@metamattj Good to see you on here! Overall I like the experience here, its just different (in a good way). It is easy to find the #rstats, #opensource, and #openscience folks!
Episode 123 of the @rstats @rweekly Highlights Podcast blends nicely with the R community! https://podverse.fm/episode/DmcMBmplP
🕸 HTTP Testing with R @maelle @RConsortium
💹 Introducing {ggblend} @mjskay
📆 Handling dates in R & Excel @AbrahamsAmieroh@twitter.com @jumpingrivers
Happy with your current podcast app but want to send a boost? You can do that directly on the Podcast Index! Find us at https://podcastindex.org/podcast/1062040
h/t @mike_thomas @batool664@twitter.com 🙏
Are you interested in how dependency-heavy your (or another) package is and why? https://github.com/jokergoo/pkgndep #rstats
I thoroughly enjoyed presenting my data-driven research on late Ottoman #Arabic #Periodicals at #DigHis23. The paper introduces stylometric authorship attribution for answering the question whether editors/publishers of magazines could or should be considered the authors of the bulk of anonymous texts in their periodicals. The method relies on collaborative work with Maxim Romanov on establishing parameters for reliable authorship attribution in Arabic for the `stylo()` package in #R (#Rstats).
Slides are available at https://tinyurl.com/dighis23-grallert
#MultilingualDH #DigitalHumanities #DigitalHistory #PeriodicalStudies #Stylometry
#!/usr/bin/env Rscript
## License: Share and Enjoy
Ult.QoL <- function(){
require("towel")
message("Don't panic!")
# "Six by nine. Forty two. That's it. That's all there is."
return(6*9)
}
If( Ult.QoL() == 42 ){
message("I always thought something was fundamentally wrong with the universe.")
}
#rstats question (or just #statistics really): I want some goodness of fit thing for an orthogonal distance regression (or total least squares regression). Since this can apparently be done by taking the first principle component (https://stats.stackexchange.com/questions/13152/how-to-perform-orthogonal-regression-total-least-squares-via-pca) we were thinking that `Rsquared = 1 - ((sum(eigenvalues)-max(eigenvalues) / max(eigenvalues))` but we're not sure.
Apparently we could also get a t-statistic or a chi-squared value: https://stackoverflow.com/questions/21395328/how-to-estimate-goodness-of-fit-using-scipy-odr
Does anyone have any opinions about this? @lakens maybe?
Curious about trying #rstats but all you know is values in a #spreadsheet? Check out this interview for an insight into what Beyond Spreadsheets with R can do for you https://freecontent.manning.com/interview-beyond-beyond-spreadsheets/
One reason I love the markr approach is that by having all the feedback in one place, before generating the feedback documents, I can review the feedback easily across the cohort and then update and add more general feedback for all to the feedback documents based on any themes I hadn’t quite spotted in the individual feedback. Likewise, it is great for developing my own future teaching as it becomes easier to see what students understood well or not so well.
@rfortunes Hi, I am surprised by the reactions of some #rstats followers to these enjoyable fortunes. Maybe, you could, I mean if a human gets this message, add to the posts from the bot some indication that these quotes reflect the changing attitudes through the history of R. I guess, those who do not recognize the names of the authors, or are not aware how much more difficult was to create, document and mantain a package or even debug a script in those days, simply miss the point.
Striking drop in annual growth of GDP per capita from 2019 to 2020 💸
A series of #dataviz|es as alternatives to two choropleth maps, comparing the trends per year as shared by Max Roser (OurWorldInData).
2️⃣ Slope graphs per country as tile grid map showing trends for each of the 186 countries featured.
🛠️ #rstats + #ggplot, in combination with the #geofacet ️📦 and #Figma to add the "how to" section + continent labels
![A geofacet (a tile grid map that represents the [most] countries of the world as equally sized rectangles, mimicking the original geographic topology as closely as possible) with colored slope charts for each country. The world map is overall very red highlighting that "average income has decreased in almost all countries from 2019 to 2020, with GDP per capita growth turning from positive to negative in most cases".
Each rectangle (slope chart) features a country code and continent labels are added to help to distinguish the original spatial shape. A "how to interpret the slope chart" box showcases how to read the chart by using the trend for Great Britain, summarizing that: "From 2019 to 2020, the annual growth of GDP per capita of Great Britain has decreased by approx. 10 percentage points, turning from positive to negative."](https://cdn.masto.host/frontendsocial/cache/media_attachments/files/110/427/721/747/437/778/small/fe7109823af85125.png)
Striking drop in annual growth of GDP per capita from 2019 to 2020 💸
A series of #dataviz|es as alternatives to two choropleth maps, comparing the trends per year as shared by Max Roser (OurWorldInData).
1⃣ Slope graph showing trends for 196 countries, overall and split per continent

The {pins} #rstats 📦 now supports reading and writing parquet files! 🎉Via @juliasilge
https://posit.co/blog/announcing-pins-1-2-0/
A population density over shaded relief map of Iran. I rather like how this turned out. What do you think?
#rayshader adventures, an #rstats tale

Need some data to test a plot idea or algorithm? On https://drawdata.xyz/ you can draw the data you want... #rstats #synthetic #dataviz
The {appeears} #rstats 📦 "provides easy access to the AppEEARS API directly from R", accordng to pkg author @koen_hufkens. That API lets you subset and download geospatial datasets from several different US gov't sources. (A free NASA Earth Data account needed.)
Pkg info: https://bluegreen-labs.github.io/appeears/
#rstats #RSpatial #GIS @rstats
A population density over shaded relief map of Japan. Was pleasantly surprised to spot Mount Fuji so I added a label.
#rayshader adventures, an #rstats tale

We are hiring a Data Scientist II at the The Prostate Cancer Clinical Trials Consortium (PCCTC)! 🎉We are excited to grow our Data Science team, and we look forward to working with #rstats programmers 🛠️with an interest in clinical research.
Please apply if you are interested! https://bit.ly/3IySWXk
The #ggblend #rstats package is now on CRAN!! https://mjskay.github.io/ggblend/
ggblend is a small algebra of operations for blending, copying, adjusting, and compositing layers in ggplot2
One problem it solves is making plots independent of draw order: e.g. by using commutative blends, like "lighten" or "multiply"


I've made a lightweight glossary #rstats package for quarto and R Markdown documents. You just tag words in your text like `r glossary("term")` and create a glossary table at the end of the section with `glossary_table()`. The definitions can be set in each glossary() function, or pulled from a YAML file.
I'm hoping to submit to CRAN soon, but would love if anyone had time for a quick test and feedback.
“Knowing just a bit of HTML and CSS can unlock the full potential of R tools like {gt}, {ggtext}, {shiny}, and Quarto.”
Video: HTML and CSS for R Users - presentation at the Harvard Data Science Initiative R User Group by Albert Rapp
https://youtu.be/Y80iGc5Vjyc
Blog post: https://albert-rapp.de/posts/16_html_css_for_r/16_html_css_for_r.html
#rstats @rstats #css #QuartoPub #quarto @rappa753 #RShiny
“A brand-new version of the #rstats tidycensus package is now on CRAN, supporting the brand-new 2022 Population Estimates (which you can't get from the API). Download the new version today and start making charts like this!” - 📦 author Kyle Walker
https://walker-data.com/tidycensus

Tornados for this week's #TidyTuesday
code: https://github.com/gkaramanis/tidytuesday/tree/master/2023/2023-week_20
The #rstats code is available on my blog: https://juliasilge.com/blog/tornadoes/
Helped someone debug some tidyverse data processing issues. It turns out "NA" was a legitimate code used in their data and readr by default interprets it as NA, not a string. Careful folks! #rstats
Edit: for anyone who doesn't know, `read_csv()` has an `na` parameter. The default is `na = c("", "NA")`. Setting it to `na = ""` fixed the issue.
A forest cover map of Uruguay using data from Copernicus Land Cover 2019.
#rayshader adventures, an #rstats tale
Is there a straightforward way in #rstats to add Interstate and highway icons to a ggplot map?
I'd totally forgotten about this quirk of #rstats functions: arguments are not evaulated until they are used, so if argument b defaults to the value of argument a, you need to use argument b in the code before you make any changes to a (or of course don't change a)
((I spent half an hour debugging something due to this today))
x <- function(a, b = a) {
a = 1
return(b) # first use of b sets the value b = a = 1
}
x(2) # returns 1
y <- function(a, b = a) {
b # sets the value of b = a = 2
a = 1
return(b)
}
y(2) # returns 2
Git Version Control and RStudio
@NHSrCommunity webinar on Thursday, May 25
3:30 pm – 4:30 pm BST / 10:30 am EDT
With Ryan Johnson – Data Science Advisor, @Posit
No one, and I mean absolutely no one, ever wants this. Why does every installer default it to be on? #rstats
The {lehdr} #rstats 📦 is designed to “query Longitudinal Employer-Household Dynamics (LEHD) US workplace/residential association and origin-destination flat files and optionally aggregate Census block-level data to block group, tract, county, or state.” By Jamaal Green & others
The LEHD data server “is a unique, and under utilized, tool in the spatial social sciences because getting bulk data is unwieldy...lehdr solves this”
https://cran.r-project.org/web/packages/lehdr/index.html
#USCensus #Census #GIS #RSpatial