tidyverse学习笔记——Time篇

Time Operators

Introduction

There are three types of date/time data that refer to an instant in time:

  • A date. Tibbles print this as .

  • A time within a day. Tibbles print this as .

  • A date-time is a date plus a time: it uniquely identifies an instant in time (typically to the nearest second). Tibbles print this as . Base R calls these POSIXct,.

today()
#> [1] "2024-12-06"
now()
#> [1] "2024-12-06 23:08:32 UTC"
Type Code Meaning Example
Year %Y 4 digit year 2021
%y 2 digit year 21
Month %m Number 2
%b Abbreviated name Feb
%B Full name February
Day %d One or two digits 2
%e Two digits 02
Time %H 24-hour hour 13
%I 12-hour hour 1
%p AM/PM pm
%M Minutes 35
%S Seconds 45
%OS Seconds with decimal component 45.35
%Z Time zone name America/Chicago
%z Offset from UTC +0800
Other %. Skip one non-digit :
%* Skip any number of non-digits # Functions

Creating date/times

During import

ISO8601

ISO8601 Is an international standard for writing dates where the components of a date are organized from biggest to smallest separated by -. For example, in ISO8601 May 3 2022 is 2022-05-03. ISO8601 dates can also include times, where hour, minute, and second are separated by :, and the date and time components are separated by either a T or a space. (i.e., 2022-05-03 16:26:43 or 2022-05-03T16:26:43).

If CSV contains an ISO8601 date or date-time, readr will automatically recognize it.

csv <- "
  date,datetime
  2022-01-02,2022-01-02 05:12
"
read_csv(csv)
#> # A tibble: 1 × 2
#>   date       datetime           
#>                     
#> 1 2022-01-02 2022-01-02 05:12:00
col_

col_ functions is used in readr for parsing columns in a dataset when importing it (e.g., with read_csv() or read_tsv()). They allow precise specification of the column type for better control over data interpretation.

  1. col_data(format = )

  2. col_time(format = )

  3. col_datetime(format = )

format: A character string specifying the expected input format for the column data.

data <- "date,time,datetime
2023-12-07,14:30:00,2023-12-07 14:30:00
2023-12-08,15:45:00,2023-12-08 15:45:00"

# Read and specify column types
df <- read_csv(data, col_types = cols(
  date = col_date(format = "%Y-%m-%d"),
  time = col_time(format = "%H:%M:%S"),
  datetime = col_datetime(format = "%Y-%m-%d %H:%M:%S")
))

From strings

  1. date:
  • ymd(...)
  • ydm(...)
  • mdy(...)
  • myd(...)
  • dmy(...)
  • dym(...)
  • ym(...)
  • my(...)
  1. time:
  • hms(...)
  • hm(...)
  • ms(...)
  1. datetime:
  • ymd_hms(...)
  • ymd_hm(...)
  • ymd_h(...)
  • dmy_hms(...)
  • dmy_hm(...)
  • dmy_h(...)
  • mdy_hms(...)
  • mdy_hm(...)
  • mdy_h(...)
  • ydm_hms(...)
  • ydm_hm(...)
  • ydm_h(...)
  1. others:
  • yq(...) :

    Parse a date-time format where the year is followed by the quarter.

    yq("2022 Q1")
    #> "2022-01-01"
    yq("2022 Q2")
    #> "2022-04-01"
    yq("2022 Q3")
    #> "2022-07-01"
    yq("2022 Q1")
    #> "2022-10-01"
    

  • (...): The location where the date-time input string to be parsed should be placed.

These functions in lubridate automatically fill in missing components such as day, minute, or second with default values (e.g., day defaults to 1, minute to 00, and second to 00), ensuring smooth parsing of incomplete inputs.

From individual components

  1. make_date(year = , month = , day = )

  2. make_date_time(year = , month = , day = , hour = , min = , sec = , tz = "UTC")


  • year: numeric year

  • month: numeric month

  • day: numeric day

  • hour: numeric hour

  • min: numeric minute

  • sec: numeric second

  • tz: time zone. Defaults to UTC.

flights |> 
  select(year, month, day, hour, minute) |> 
  mutate(departure = make_datetime(year, month, day, hour, minute))
#> # A tibble: 336,776 × 6
#>    year month   day  hour minute departure          
#>                      
#> 1  2013     1     1     5     15 2013-01-01 05:15:00
#> 2  2013     1     1     5     29 2013-01-01 05:29:00
#> 3  2013     1     1     5     40 2013-01-01 05:40:00
#> 4  2013     1     1     5     45 2013-01-01 05:45:00
#> 5  2013     1     1     6      0 2013-01-01 06:00:00
#> 6  2013     1     1     5     58 2013-01-01 05:58:00
#> # ℹ 336,770 more rows

From other types

  1. as_date(...)

  2. as_datatime(...)

For example:

as_datetime(today())
#> [1] "2024-12-06 UTC"
as_date(now())
#> [1] "2024-12-06"

update

update() can create a new date-time.

update(datetime, year = 2030, month = 2, mday = 2, hour = 2)
#> [1] "2030-02-02 02:34:56 UTC"

Getting Components

  1. date(x)

  2. year(x)

  3. month(x, label, abbr)

  4. day:

  • wday(x, label, abbr): The day of the week.

  • mday(x): The day of the month.

  • qday(x): The day of the quarter.

  • yday(x): The day of the year.

  1. hour(x)

  2. minute(x)

  3. second(x)

  4. tz(x): Time zone.

  5. quarter(x)

  6. semester(x, with_year)

  7. am and pm:

  • am(x)

  • pm(x)

  1. leap_year(x)

  2. dst(x): Daylight savings.


  • x: A date-time object from which components will be extracted.

  • label: A logical value applicable to wday() and month().

    • Defaults to FALSE, which returns numeric representations (e.g., 1-7 for weekdays, or 1-12 for months).
  • abbr: A logical value used with label = TRUE in wday() and month().

    • If TRUE, the function returns an abbreviated ordered factor (e.g., “Mon” for wday(), or “Jan” for month()).
    • Defaults to TRUE. Set to FALSE for full names.
  • with_year: A logical value applicable to semester(). If TRUE, the function returns semesters with years.


These functions can also modify components of a date/time.

(datetime <- ymd_hms("2026-07-08 12:34:56"))
#> [1] "2026-07-08 12:34:56 UTC"

year(datetime) <- 2030
datetime
#> [1] "2030-07-08 12:34:56 UTC"
month(datetime) <- 01
datetime
#> [1] "2030-01-08 12:34:56 UTC"
hour(datetime) <- hour(datetime) + 1
datetime
#> [1] "2030-01-08 13:34:56 UTC"

Time spans

Duration

as.duration
# How old is Hadley?
h_age <- today() - ymd("1979-10-14")
h_age
#> Time difference of 16490 days
as.duration(h_age)
#> [1] "1424736000s (~45.15 years)"
is.duration
duration

duration() creates a duration object with the specified values. Durations always record the time span in seconds.

duration(num = NULL, units = "second", ...)
  1. num

The number or a character vector of time units. In string representation all unambiguous name units and abbreviations and ISO 8601 formats are supported; ‘m’ stands for month and ‘M’ for minutes unless ISO 8601 “P” modifier is present (see examples). Fractional units are supported.

  1. units

A character string that specifies the type of units that num refers to. When num is character, this argument is ignored.

duration(90, "seconds")
#> [1] "90s (~1.5 minutes)"
duration(second = 3, minute = 1.5, hour = 2, day = 6, week = 1)
#> [1] "1130493s (~1.87 weeks)"
duration("2hours 2minutes 1second")
#> [1] "7321s (~2.03 hours)"

Using durations for calculations may potentially alter the time zone of the resulting date-time object. So there is Period().

x <- ymd("2009-08-03", tz = "America/Chicago")
x + ddays(1) + dhours(6) + dminutes(30)
#> [1] "2009-08-04 06:30:00 CDT"
x + ddays(100) - dhours(8)
#> [1] "2009-11-10 15:00:00 CST"

Period

period() creates a period object with the specified values.

period(num = NULL, units = "second", ...)
  1. num

a numeric or character vector. A character vector can specify periods in a convenient shorthand format or ISO 8601 specification. All unambiguous name units and abbreviations are supported, “m” stands for months, “M” for minutes unless ISO 8601 “P” modifier is present. Fractional units are supported but the fractional part is always converted to seconds.

  1. units

A character vector that lists the type of units to be used. The units in units are matched to the values in num according to their order. When num is character, this argument is ignored.

period(c(3, 1, 2, 13, 1), c("second", "minute", "hour", "day", "week"))
#> [1] "20d 2H 1M 3S"
period(second = 90, minute = 5)
#> [1] "5M 90S"
period("2hours 2minutes 1second")
#> [1] "2H 2M 1S"
ymd("2009-08-03") + period(1, "day")
#> [1] "2009-08-04" 

Intervals

When dividing periods, the result reflects the average conversion between the units:

years(1) / days(1) 
#> [1] 365.25

This indicates one year is equivalent to an average of 365.25 days.

Intervals represent time spans bettween two specific date-time points and allow division. Use %--% to create an interval.

y2023 <- ymd("2023-01-01") %--% ymd("2024-01-01")
y2023
#> [1] 2023-01-01 UTC--2024-01-01 UTC
y2023 / days(1)
#> [1] 365

Round

round_date

round_date() takes a date-time object and time unit, and rounds it to the nearest value of the specified time unit. For rounding date-times which are exactly halfway between two consecutive units, the convention is to round up.

round_date(x, unit = )
  1. x: a vector of date-time objects.

  2. unit: A string, Period object or a date-time object.

floor_date and ceiling date

floor_date() always rounds down and ceiling_date() always rounds up.

References

R4DS

你可能感兴趣的:(tidyverse系列,笔记,r语言)