AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |
Back to Blog
Without trying to step on toes, here's my understanding of how dealing with lists → vectors should typically fall out: because separate doesn't know what column a piece of data if finds belongs to, is just distinguishes separator text from values. In the end it just has the values, in order, that it found in each line. #> Mean `Weighted Avg` IBU `Est Calories` abv Select("Mean", "Weighted Avg", "IBU", "Est Calories", "abv") Into = c("skip", "Ratings", "Mean", "Weighted Avg","IBU","Est Calories","abv"), # the "skip" column catches the empty column that preceds "Ratings:" suppressPackageStartupMessages(library("tidyverse")) The following works but it probably isn't what you want but it may give you a better idea of how separate work.īTW the sep value is as complicated as it is because the separator text has the format that it does. Fill just tells it what to do if it cannot find enough pieces of data for all the column names listed in col. so the "RATINGS" at the beginning of each line would be considered to be coming after a zero width column because there is no value which preceeds it.Īnother thing about separate is that is cannot figure out where columns belong or what column names are, it just separates thing as it finds in order. Also " and "\s" are equivalent regular expression and match a whitespace character.Īnother issue is that sep matches specifically the text between columns. It may be that you meant " but that would not have worked either. It is what regular expressions call a character class and matches any 's' or '/', i.e. This "" and this "" equivalent regular expressions. sep is a regular expression that is used to match the text between columns. S = "" Probably isn't doing what you intend it to. There are a couple of issues with your first separate: separate(data, value, into = c("Ratings","Weighted Avg","IBU","Est Calories","abv"),sep="",fill = "right") CALORIES` IBU MEAN `WEIGHTED AVG` RATINGS Mutate(split = str_split(string, " ")) %>% Here is the new cleaner solution as reprex to easy copy-paste with reprex_clean() reprex::reprex_info() Thanks for these great suggestions ! I missed it with str_split result! Thanks a lot ! CALORIES` ` IBU` ` MEAN` ` WEIGHTED AVG` RATINGS # then get the result in column with NA in cells where you did not have value in string # you can now separate in a fixed 2 length vector # then unnest the column before further data prep # you can apply the previous to each row using map #> The following objects are masked from 'package:base':ĭata "RATINGS: 4" " MEAN: 3.83/5.0" " WEIGHTED AVG: 3.39/5" #> The following objects are masked from 'package:stats': I would do it like this, using purrr to help deal with list results from stringr reprex::reprex_info() You can use tidyverse tools to get a fixed length column to separate. Separate(data, value, into = c("Ratings","Weighted Avg","IBU","est Calories","abv"),sep= " ",extra = "merge") #but I can't split on it with tidyr's seperate Separate(data, value, into = c("Ratings","Weighted Avg","IBU","Est Calories","abv"),sep="",fill = "right") I would assume as I want to split where three spaces occur, that the easiest way would be to simply specify the spaces in brackets, but I don't think tidyr likes that? library(stringr) I'm having trouble where stringr::str_view will recognize the string I want to split on, but I can't get tidyr::seperate, to separate the data properly. Is there a 'tidy' approach to splitting data from text into columns, where each 'vector of text' does not contain the same number of elements?
0 Comments
Read More
Leave a Reply. |