User Tools

Site Tools


cloud:object_storage

S3/Swift object storage in the de.NBI Cloud

Introduction

Object storage is a type of storage to store and share data in cloud environments. An object consists of a unique name, the actual data and associated metadata (like access restrictions and user defined metadata). Contrary to filesystems objects are not organised in a hierarchy, but in a flat container. The data is accessed via http based protocols/APIs (application programming interfaces), like s3 and swift.

Object stores have a higher abstraction from the underlying hardware than filesystems, enabling for more flexibility on the administration side, e.g. distribution of data, replication of data and transparent storage capacity extension.

Users benefit from the ubiquitous availability of the storage service. The data can be accessed and written from anywhere, as long a the user has the appropriate access/write permissions. These permissions can be controlled via access control lists by the container owners. With the concept of large objects, object have no size limitations. A caveat on object stores is the possible higher network latency between the storage servers and the client machines.

See also: Introduction to object storage by Rackspace

Object storage access

The object storage in the de.NBI cloud can be accessed either via the S3-API or the Swift-API. These APIs differ in their terminology, capabilities and respective tools: Objects are stored in containers (swift) or buckets (s3). Access is controlled via access control lists (s3 and swift) or via policies (only s3).

The s3 and swift APIs in the de.NBI cloud is based on radosgw, a part of the ceph object store. For further details on this implementation see http://docs.ceph.com/docs/giant/radosgw/ .

Both APIs have command line clients as well as graphical user interfaces.

URL of s3/swift server: https://s3.computational.bio.uni-giessen.de.

Access via website

The OpenStack dashboard provides an easy way to manage the object storage in the side menu:

By clicking on Containers, the object storage buckets available to the current project are displayed:

This basic interface allows the definition of containers and upload/download of files.

Access via s3

S3 offers a lot of functionality and a very fine grained authorization model (see Amazon's description of the S3 API). The functionality in the OpenStack dashboard offers only the bare minimum to handle buckets and files. The same is also true for the Swift API, which is not explained on this site.

In all standard use cases, a bucket should be created via the OpenStack dashboard. This assign the bucket to the OpenStack project, and will grant all project user's access to the bucket. Bucket for individual users or use cases are possible, but require a different way to create access credentials. Contact support if you need such a bucket.

Retrieving access credentials

Access to the S3 API requires credentials acceptable for the radosgw instance. The instance is linked to keystone and it able to authenticate users based on credentials generated by OpenStack. These credentials are bound to a project, so different projects require different credentials.

Unfortunately the OpenStack dashboard does not provide access to the necessary credentials, so the command line has to be used:

  • Create a OpenStack v3 RC file

Project → Compute → Access & Security → API Access → Download OpenStack RC File v3

  • Source the downloaded file in a terminal session on a desktop machine or lummerland. This will prompt for your password and set a number of environment variable.
  • You can list all your S3 access keys using
openstack --os-identity-api-version 3 ec2 credentials list
  • To create new credentials, use
openstack --os-identity-api-version 3 ec2 credentials create
  • The access and secret values are the credentials for accessing the S3 buckets of the project.

Configuration of S3 clients

Using the s3cmd command line utility, this section demonstrates how to access the BCF S3 object storage. The setup should be similar for other tools, libraries or applications.

  • Invoke s3cmd –configure to start the interactive configuration dialog
  • Enter the access and secret key
  • Do not change the default region
  • Setup encryption if necessary, or press enter to skip it
  • Use HTTPS protocol (S3 at BCF is NOT available via HTTP)
  • Enter proxy details if necessary
  • Skip test access since s3cmd contacts the amazon servers by default
  • Save settings
  • Edit the generated configuration and change the host_base value to s3.computational.bio.uni-giessen.de
  • Also change host_bucket value to %(bucket).s3.computational.bio.uni-giessen.de
  • Test the configuration, e.g. by invoking s3cmd ls to list all buckets. The output should print one line containing s3://<bucket name> for all buckets in the project

If you need access to different project, you can create multiple configuration files with s3cmd.

Limitations of Ceph RadosGW

Certain aspects of S3 are not supported yet in Ceph, including:

  • Object lifecycle management (e.g. automatic delete or staging similar to Amazon Glacier) (deleting supported by Luminous release, not tested yet)
  • S3 based web sites
  • AWS4 authentication with OpenStack credentials (unsure about this, needs to be tested, stick to AWS v2)
  • SSL certificate does not support wildcard hostnames yet (resolved by new wildcard certificate)

For programmer

Java

The official ceph doku for the aws sdk java library is outdated and results in an error. Please use the builder component of AmazonS3Client class. An example can be found here:

AWSCredentials credentials = new BasicAWSCredentials(this.executorS3Key, this.executorS3Secret);
ClientConfiguration clientConfiguration = new ClientConfiguration();
AwsClientBuilder.EndpointConfiguration regionOne = new AmazonS3Builder.EndpointConfiguration("s3.computational.bio.uni-giessen.de", "RegionOne");

AmazonS3 amazonS3 = AmazonS3Client.builder().withCredentials(new AWSStaticCredentialsProvider(credentials)).withRegion("RegionOne").withEndpointConfiguration(regionOne).build();
for(Bucket bucket : amazonS3.listBuckets()) {
          System.out.println(bucket.getName());
}

S3 supports the generation of temporary download urls, so called signed urls. You can generate signed urls with the help of s3cmd. For a download link that is available for a defined amount of time use

s3cmd signurl <path_to_object> +<availability_in_seconds>

If you want to generate a download link that is available until a certain moment you can use

s3cmd signurl <path_to_object> <seconds_since_01_01_1970>

It will generate the signed URL that you can pass to the person who will download the object/file. (Check the protocoll of the link. If it is http, you have to change it to https.)

Access via swift

The de.NBI object storage can be accessed through the swift API with the swift commandline tool. Therefore you need a cloud project, the swiftclient and an openstack rc file (see Setup of swift tool).

Setup of swift tool

  • Install python3-swiftclient (Ubuntu)
  • Obtain an openstack rc file: Cloud Webinterface → login → Compute → Access & Security → Tab “API Access” → Button “DOWNLOAD OPENSTACK RC FILE V3”
  • On the commandline:
source <youropenstackrc>
  • Enter your password when asked
  • Check if everything worked with
swift list
  • If everthing is okay you will either see your containers (if you have some) or you will see nothing. Otherwise you will see an error message

Create container

swift post <containername>

Upload file

# for small files
swift upload <container> <file>

# for large files >=5GB
swift upload -S <chunksize in bytes max 5GB> <container> <file> 

Download file

swift download <container> <file>

Remove file

swift delete <container> <file>

Remove container

swift delete <container>

Access via s3cmd

Here a documentation how to use s3cmd to interact with your s3 buckets. You get your Access key and Secret Key via the Openstack platform (Project → API Access → View Access Data).

https://s3tools.org/s3cmd-howto

Howto

Configure CORS for an s3-Bucket

CORS can be configured via a xml-configuration-file.

<CORSConfiguration>
 <CORSRule>
   <AllowedOrigin>https://<replace-me></AllowedOrigin>
   <AllowedOrigin>http://<replace-me>:*</AllowedOrigin>

   <AllowedMethod>GET</AllowedMethod>
   <AllowedMethod>PUT</AllowedMethod>

   <AllowedHeader>*</AllowedHeader>
 </CORSRule>
</CORSConfiguration>

Set this configuration for a bucket with

s3cmd setcors cors.xml s3://<your-bucket>

Further information can be found at https://docs.aws.amazon.com/AmazonS3/latest/userguide/ManageCorsUsing.html

cloud/object_storage.txt · Last modified: 2021/03/03 07:47 by ljelonek