Storage is (still) expensive.
Julia has relatively good support for programmatically accessing Amazon AWS buckets with JuliaCloud’s AWS.jl.
Support for Google Cloud isn’t as good. The GoogleCloud.jl package hasn’t been actively maintained for a while. Which is a shame, because Google Cloud is cost-competitive with AWS and other options. Google Cloud also hosts several public storage buckets.
Google and the European Space Agency, for instance, have tried to make it easier for the public to freely access Sentinel 2 data. Sentinel 2 is the designation for the ESA’s imagery satellites. While the resolution isn’t as good as newer commercial offerings, it’s global coverage, short update frequency, and the large number of available bands arguably makes it the best option for anyone interested in working with remote sensing data.
These benefits come at a cost. The average ESA SAFE file is well over 2gb in size, and each tile only covers a small physical area. To put the storage issue into perspective, the index CSV for all of the available Level 2 (processed) SAFE files itself is several gigabytes.
I’ll address these concerns in an upcoming series of tutorials, but for now I want to focus simply on how you can use Julia to access Google Cloud storage buckets, such as their publicly available Sentinel 2 repository.
This tutorial assumes that you have Julia > 1.5. You will also need to install the latest available version of GoogleCloud.jl with Pkg.add(“GoogleCloud”).
First, follow the instructions on JuliaCloud’s Google Cloud Quick Start guide to set up your Google Cloud credentials. The instructions themselves are a little out of date. For instance, in step four, you’ll have to select Cloud Storage > Storage Admin. However, they’re still easy to follow.
Once you have your key as a JSON file, save it to an accessible folder.
Then, open Julia and import GoogleCloud, JSON, and LightXML.jl, an XML parser.
using GoogleCloud
using JSON
using LightXML
You will then want to initialize your session.
using GoogleCloud
using JSON
credentialPath = "path_to_credentials_json"
credentials = JSONCredentials(credentialPath)
session = GoogleSession(credentials, ["devstorage.full_control"])
set_session!(storage, session)
These steps are explained in greater detail, for anyone interested, in the Quick Start guide linked above.
The next steps are simple, but aren’t well documented. The documentation for GoogleCloud.jl assumes that you are using a private bucket without sub-directories. This posses an issue, since a large private or public storage bucket will almost always have some sort of directory structure.
We’ll start by going over how to get a list of files in a public storage bucket. All of the examples here will use Google Cloud’s Sentinel 2 repository. This repository is, well, large, so we’ll work with a sub-directory, which in this case is an L2A SAFE directory for MGRS tile 50CMA.
# Include all of the code above.
directory = "L2/tiles/50/C/MA/S2A_MSIL2A_20191201T001751_N0213_R030_T50CMA_20191201T014204.SAFE/"
There are a couple of things to be aware of. You should not include a leading slash in your directory string. You should also not include the bucket name, or any Google-specific API particulars. Simply write the directory as you see it on Google Cloud Platform.
You parse sub-directories by running GoogleCloud.storage(:Object, :list, bucket; prefix=directory, delimiter="/"), writing the output to an IOBuffer, and converting the IOBuffer to a Dict{} with JSON.parse().
rawFolderList = GoogleCloud.storage(:Object, :list, "gcp-public-data-sentinel-2"; prefix=SAFE, delimiter="/")
io = IOBuffer()
write(io, rawFolderList)
fileList = String(take!(io))
fileList = JSON.parse(fileList)
This will output a Dict{String, Any} with two keys, “kind” and “items”. “items” itself is a Dict{String, Any} with a list of all of the files and folders in the sub-directory. You can return a nested list of files in all of the folders in the sub-directory by removing delimiter="/” in GoogleCloud.storage().
We’ll download MTD_MSIL2A.xml, which contains a lot of interesting metadata. This is the fifth item in fileList["items"]. The information we need is in fileList["items"][5]["name"].
downloadTarget = fileList["items"][5]["name"]
Downloading files is as simple as changing :list to :get in GoogleCloud.storage() and removing any optional inputs, such as prefix and delimiter. If you are simply trying to download a file without first using :list to parse the directory structure, you need to be very careful that the location string is formatted correctly.
rawFileData = GoogleCloud.storage(:Object, :get, "gcp-public-data-sentinel-2", downloadTarget)
You then follow the same steps from above to write the output into an IOBuffer.
io = IOBuffer();
write(io, rawFileData)
collectedData = String(take!(io))
The final step is parsing the collected output with an XML parser. We’re going to use parse_string() from the LightXML package.
outputFile = LightXML.parse_string(fileString)
The final output is:
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<n1:Level-2A_User_Product xmlns:n1="https://psd-14.sentinel2.eo.esa.int/PSD/User_Product_Level-2A.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://psd-14.sentinel2.eo.esa.int/PSD/User_Product_Level-2A.xsd">
<n1:General_Info>
<Product_Info>
<PRODUCT_START_TIME>2019-12-01T00:17:51.024Z</PRODUCT_START_TIME>
<PRODUCT_STOP_TIME>2019-12-01T00:17:51.024Z</PRODUCT_STOP_TIME>
<PRODUCT_URI>S2A_MSIL2A_20191201T001751_N0213_R030_T50CMA_20191201T014204.SAFE</PRODUCT_URI>
<PROCESSING_LEVEL>Level-2A</PROCESSING_LEVEL>
...
I’ll be posting a series of tutorials on working with Sentinel 2 data with Sentinel.jl. If you are interested in learning more, please subscribe!