How to Validate a Big Downloaded File from Object Storage (OCI)

When you upload relatively a big file to Object Storage in OCI, it doesn’t have the MD5 hash ready for you. It’s because the big file is split into multi parts and they are uploaded into separate space. Then, when you download the file, the multi parts are downloaded sequentially and they are put into one file on the client side. Object Storage does not calculate the MD5 hash putting the multi parts together on the service side due to its sheer required processing power it may need. When you try to view the information of the file on Object Storage, you don’t see the actual MD5 hash.

Tough opc-multipart-md5 looks promising, that’s just a part of the whole thing. To get my point crossed, when I uploaded a small file, MD5 hash is calculated and available on the service side.

Now, how do we solve this problem? The best way is to calculate the MD5 hash before you upload the file with md5sum and then attach the MD5 hash to metadata when uploading the file to Object Storage.

You can get the data by executing the following command.

 oci os object head --auth instance_principal -bn backup --name 2022-03-20.zip

Here is the data you get as JSON.

{
  "accept-ranges": "bytes",
  "access-control-allow-credentials": "true",
  "access-control-allow-methods": "POST,PUT,GET,HEAD,DELETE,OPTIONS",
  "access-control-allow-origin": "*",
  "access-control-expose-headers": "accept-ranges,access-control-allow-credentials,access-control-allow-methods,access-control-allow-origin,content-length,content-type,date,etag,last-modified,opc-client-info,opc-client-request-id,opc-meta-md5hash,opc-multipart-md5,opc-request-id,storage-tier,version-id,x-api-id",
  "content-length": "217286450",
  "content-type": "application/octet-stream",
  "date": "Sun, 20 Mar 2022 04:42:05 GMT",
  "etag": "500168df-c90a-4d35-b4f6-c7a6c99d5969",
  "last-modified": "Sun, 20 Mar 2022 04:40:27 GMT",
  "opc-client-request-id": "92C495DFAA8647C4B230B10580FED145",
  "opc-meta-md5hash": "1eed774bb61c15f8c50c7771e71bbb24",
  "opc-multipart-md5": "i1Ap4X2OnVAU7aK8RwxgMg==-2",
  "opc-request-id": "iad-1:mHbU0Aq4kW9abs3NCSv77cOKPDYdcQ74lsT4sPgfDI44xXLWLwYk8MKcX3WmPE7L",
  "storage-tier": "Standard",
  "version-id": "29ba4883-5fb3-4316-acbc-ceb218b5e3d1",
  "x-api-id": "native"
}

So the process would be to execute oci os object head on the object you are going to download and keep the MD5 hash in a variable. Then once you download the file, have md5sum calculate the MD5 hash on the downloaded file. And then see if the calculated MD5 matches the one from oci os object head.

Here is the bash script I came up with to upload the file with the metadata.

rm -rf /home/opc/backup.zip
zip -r /home/opc/backup.zip /home/opc/wordpress/*

md5=`md5sum backup.zip | awk '{ print $1 }'`
filename=`date +%Y-%m-%d`.zip
json='{"md5hash":''"'$md5'"}'
oci os object put --auth instance_principal -bn backup --file backup.zip --name $filename --force --metadata $json

I have cron’ed the bash script to run every day so the file backup is automated.

Leave a Reply

Your email address will not be published.