This function allows users to (i) generate md5 checksums on files that have been copied/download/transferred and (ii) check that these match with md5 checksums generated on the original files.

md5_check(file_paths, original_md5, column_to_join_by)

Arguments

file_paths

a data.frame or tibble object with the following two columns:

  • file_path: full file path for files for which md5s should be generated

  • file_name: file name, which should match the file name in the original_md5 dataframe.

original_md5

a data.frame or tibble object, with md5 checksums from the original files. Should include two columns:

  • file_name: file name

  • original_md5: md5 checksum

column_to_join_by

character() vector, indicating the name of the column in the dataframe supplied to the argument original_md5, which will be used to join original and transfer md5 dataframes.

Value

a tibble object with the following columns:

  • file_path: full file path for files for which md5s were generated

  • file_name: file name

  • original_md5: original md5 checksum

  • new_md5: generated md5 checksum

  • same_md5: contains values TRUE/FALSE; FALSE if md5 checksums do not match, and TRUE if they match.

Examples

file_path <- system.file( "testdata", package = "rutils", mustWork = TRUE ) original_md5 <- readr::read_delim( file.path(file_path, "md5_test_data.txt"), delim = "\t" )
#> #> ── Column specification ──────────────────────────────────────────────────────── #> cols( #> file_name = col_character(), #> original_md5 = col_character() #> )
file_paths <- tibble::tibble( file_path = list.files( file_path, pattern = "over.chain", full.names = TRUE ) ) %>% dplyr::mutate( file_name = basename(file_path) ) md5_df <- md5_check( file_paths = file_paths, original_md5 = original_md5, column_to_join_by = "file_name" ) md5_df
#> # A tibble: 0 x 5 #> # … with 5 variables: file_path <chr>, file_name <chr>, original_md5 <chr>, #> # new_md5 <chr>, same_md5 <lgl>
print("All check sums match between files?")
#> [1] "All check sums match between files?"
all(md5_df$same_md5)
#> [1] TRUE