Class: FileRenameService

Inherits:
Object
  • Object
show all
Defined in:
app/services/file_rename_service.rb

Overview

We sometimes have data with filenames that contain characters that AWS S3 cannot handle. In those cases we want to:

  1. Rename the files to something that is AWS legal. Replace all illegal characters with a _ (underscore)

  2. Ensure there are no duplicate file names after the renaming by appending a (1), (2) at the end of the filename if the file has been renamed

  3. Keep a record of all of the file names as they originally existed and what they were renamed to

  4. The record goes into a file called files_renamed.txt, which contains a list of all files that have been renamed and what they were renamed to, along with a timestamp

  5. This files_renamed.txt file gets added to the dataset as a payload file, akin to a README.txt or license.txt

Constant Summary collapse

ILLEGAL_CHARACTERS =

See this reference for the full list of characters that cannot be used in filenames for AWS S3: docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html This service will only attempt to fix the most likely problems. For example, we will not try to handle “ASCII character ranges 00–1F hex (0–31 decimal) and 7F (127 decimal)” Note that we do not rename for a single space, but we do for two spaces together, because according to S3 docs: “Significant sequences of spaces might be lost in some uses (especially multiple spaces)”

[
  "&", "$", "@", "=", ";", ":", "+", "  ", ",", "?", "\\", "{", "}", "^", "%", "`", "[", "]", "'", '"', ">", "<", "~", "#", "|"
].freeze

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(filename:) ⇒ FileRenameService

Returns a new instance of FileRenameService.



24
25
26
# File 'app/services/file_rename_service.rb', line 24

def initialize(filename:)
  @original_filename = filename
end

Instance Attribute Details

#original_filenameObject (readonly)

Returns the value of attribute original_filename.



22
23
24
# File 'app/services/file_rename_service.rb', line 22

def original_filename
  @original_filename
end

Instance Method Details

#check_if_file_needs_renameObject



32
33
34
35
36
37
# File 'app/services/file_rename_service.rb', line 32

def check_if_file_needs_rename
  ILLEGAL_CHARACTERS.each do |char|
    return true if @original_filename.include? char
  end
  false
end

#needs_rename?Boolean

Returns:

  • (Boolean)


28
29
30
# File 'app/services/file_rename_service.rb', line 28

def needs_rename?
  @needs_rename ||= check_if_file_needs_rename
end

#new_filename(index) ⇒ Object

Replace every instance of an illegal character with an underscore. Append an index number in parentheses just before the file extension, so we avoid ever accidentally naming two files identically and causing one to over-write the other.



43
44
45
46
47
48
49
50
51
# File 'app/services/file_rename_service.rb', line 43

def new_filename(index)
  nf = @original_filename.dup
  ILLEGAL_CHARACTERS.each do |char|
    nf.gsub!(char, "_")
  end
  split = nf.split(".")
  split[-2] = "#{split[-2]}(#{index})"
  split.join(".")
end