library(tidyverse)
library(stringr)
stringr没有被包括在tidyverse中
basic
在字符串中引用符号,需要加\
特殊字符:
- \n newline
- \t tab
String length 字符长度
str_length(c("a", "R for data science", NA))
## [1] 1 18 NA
Combining strings 字符合并
str_c("x", "y")
## [1] "xy"
str_c("x", "y", "z")
## [1] "xyz"
str_c("x", "y", sep = ", ")
## [1] "x, y"
将所要合并的向量作为第一个参数传入,需要用到collapse参数
str_c(1:10,collapse = "")
## [1] "12345678910"
str_c是向量化的,对于短向量会自动循环补齐
str_c("prefix-", c("a", "b", "c"), "-suffix")
## [1] "prefix-a-suffix" "prefix-b-suffix" "prefix-c-suffix"
Subsetting strings 字符子集
str_sub取字符串的闭区间(即包括开始和结束)
x <- c("Apple", "Banana", "Pear")
str_sub(x, 1, 3)
## [1] "App" "Ban" "Pea"
负数,则从后往前数,但仍是左在前,右在后
str_sub(x,-3,-1)
## [1] "ple" "ana" "ear"
字符串过短时,str_sub也不会崩溃
str_sub("a", 1, 5)
## [1] "a"
str_sub可以原地修改
str_sub(x, 1, 1) <- str_to_lower(str_sub(x, 1, 1))
x
## [1] "apple" "banana" "pear"
Locales 大小写转换
- str_to_lower()
- str_to_upper()
- str_to_title()
Matching patterns with regular expressions 正则表达式
为了学习正则表达式,使用str_view和str_view_all
basic
x <- c("apple", "banana", "pear")
str_view(x, "an")
.可以匹配任意字符(除了空行)
str_view(x, ".a.")
如果要匹配. 前面还需再加个\.,但为了正则表达式,还需要再加个\.
str_view(c("abc", "a.c", "bef"), "a\\.c")
如果要匹配\,需要四个斜杠
x <- "a\\b"
str_view(x, "\\\\")
Anchors 锚
从开头或结尾开始匹配
- \^ 从开头开始匹配
- \$ 从结尾开始匹配
x <- c("apple", "banana", "pear")
str_view(x, "^a")
str_view(x, "a$")
如果要严格匹配某个字符串,可以同时使用\^和\$
x <- c("apple pie", "apple", "apple cake")
str_view(x, "apple")
str_view(x, "^apple$")
Character classes and alternatives
除了\.外其他的匹配
- \d: matches any digit.
- \s: matches any whitespace (e.g. space, tab, newline).
如果要创建正则表达式,需要有双斜杠