Class: FileRenameService
- Inherits:
-
Object
- Object
- FileRenameService
- Defined in:
- app/services/file_rename_service.rb
Overview
We sometimes have data with filenames that contain characters that AWS S3 cannot handle. In those cases we want to:
-
Rename the files to something that is AWS legal. Replace all illegal characters with a _ (underscore)
-
Ensure there are no duplicate file names after the renaming by appending a (1), (2) at the end of the filename if the file has been renamed
-
Keep a record of all of the file names as they originally existed and what they were renamed to
-
The record goes into a file called files_renamed.txt, which contains a list of all files that have been renamed and what they were renamed to, along with a timestamp
-
This files_renamed.txt file gets added to the dataset as a payload file, akin to a README.txt or license.txt
Constant Summary collapse
- ILLEGAL_CHARACTERS =
See this reference for the full list of characters that cannot be used in filenames for AWS S3: docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html This service will only attempt to fix the most likely problems. For example, we will not try to handle “ASCII character ranges 00–1F hex (0–31 decimal) and 7F (127 decimal)” Note that we do not rename for a single space, but we do for two spaces together, because according to S3 docs: “Significant sequences of spaces might be lost in some uses (especially multiple spaces)”
[ "&", "$", "@", "=", ";", ":", "+", " ", ",", "?", "\\", "{", "}", "^", "%", "`", "[", "]", "'", '"', ">", "<", "~", "#", "|" ].freeze
Instance Attribute Summary collapse
-
#original_filename ⇒ Object
readonly
Returns the value of attribute original_filename.
Instance Method Summary collapse
- #check_if_file_needs_rename ⇒ Object
-
#initialize(filename:) ⇒ FileRenameService
constructor
A new instance of FileRenameService.
- #needs_rename? ⇒ Boolean
-
#new_filename(index) ⇒ Object
Replace every instance of an illegal character with an underscore.
Constructor Details
#initialize(filename:) ⇒ FileRenameService
Returns a new instance of FileRenameService.
24 25 26 |
# File 'app/services/file_rename_service.rb', line 24 def initialize(filename:) @original_filename = filename end |
Instance Attribute Details
#original_filename ⇒ Object (readonly)
Returns the value of attribute original_filename.
22 23 24 |
# File 'app/services/file_rename_service.rb', line 22 def original_filename @original_filename end |
Instance Method Details
#check_if_file_needs_rename ⇒ Object
32 33 34 35 36 37 |
# File 'app/services/file_rename_service.rb', line 32 def check_if_file_needs_rename ILLEGAL_CHARACTERS.each do |char| return true if @original_filename.include? char end false end |
#needs_rename? ⇒ Boolean
28 29 30 |
# File 'app/services/file_rename_service.rb', line 28 def needs_rename? @needs_rename ||= check_if_file_needs_rename end |
#new_filename(index) ⇒ Object
Replace every instance of an illegal character with an underscore. Append an index number in parentheses just before the file extension, so we avoid ever accidentally naming two files identically and causing one to over-write the other.
43 44 45 46 47 48 49 50 51 |
# File 'app/services/file_rename_service.rb', line 43 def new_filename(index) nf = @original_filename.dup ILLEGAL_CHARACTERS.each do |char| nf.gsub!(char, "_") end split = nf.split(".") split[-2] = "#{split[-2]}(#{index})" split.join(".") end |