学习《R for Data Science》(8)——Stringr

2019/01/30

library(tidyverse)
library(stringr)

stringr没有被包括在tidyverse中

basic

在字符串中引用符号,需要加\

特殊字符:

String length 字符长度

str_length(c("a", "R for data science", NA))
## [1]  1 18 NA

Combining strings 字符合并

str_c("x", "y")
## [1] "xy"
str_c("x", "y", "z")
## [1] "xyz"
str_c("x", "y", sep = ", ")
## [1] "x, y"

将所要合并的向量作为第一个参数传入,需要用到collapse参数

str_c(1:10,collapse = "")
## [1] "12345678910"

str_c是向量化的,对于短向量会自动循环补齐

str_c("prefix-", c("a", "b", "c"), "-suffix")
## [1] "prefix-a-suffix" "prefix-b-suffix" "prefix-c-suffix"

Subsetting strings 字符子集

str_sub取字符串的闭区间(即包括开始和结束)

x <- c("Apple", "Banana", "Pear")
str_sub(x, 1, 3)
## [1] "App" "Ban" "Pea"

负数,则从后往前数,但仍是左在前,右在后

str_sub(x,-3,-1)
## [1] "ple" "ana" "ear"

字符串过短时,str_sub也不会崩溃

str_sub("a", 1, 5)
## [1] "a"

str_sub可以原地修改

str_sub(x, 1, 1) <- str_to_lower(str_sub(x, 1, 1))
x
## [1] "apple"  "banana" "pear"

Locales 大小写转换

  • str_to_lower()
  • str_to_upper()
  • str_to_title()

Matching patterns with regular expressions 正则表达式

为了学习正则表达式,使用str_view和str_view_all

basic

x <- c("apple", "banana", "pear")
str_view(x, "an")
  • apple
  • banana
  • pear

.可以匹配任意字符(除了空行)

str_view(x, ".a.")
  • apple
  • banana
  • pear

如果要匹配. 前面还需再加个\.,但为了正则表达式,还需要再加个\.

str_view(c("abc", "a.c", "bef"), "a\\.c")
  • abc
  • a.c
  • bef

如果要匹配\,需要四个斜杠

x <- "a\\b"
str_view(x, "\\\\")
  • a\b

Anchors 锚

从开头或结尾开始匹配

  • \^ 从开头开始匹配
  • \$ 从结尾开始匹配
x <- c("apple", "banana", "pear")
str_view(x, "^a")
  • apple
  • banana
  • pear
str_view(x, "a$")
  • apple
  • banana
  • pear

如果要严格匹配某个字符串,可以同时使用\^和\$

x <- c("apple pie", "apple", "apple cake")
str_view(x, "apple")
  • apple pie
  • apple
  • apple cake
str_view(x, "^apple$")
  • apple pie
  • apple
  • apple cake

Character classes and alternatives

除了\.外其他的匹配

  • \d: matches any digit.
  • \s: matches any whitespace (e.g. space, tab, newline).

如果要创建正则表达式,需要有双斜杠