Block-based storage concept of the MPI CBS

Summary: You need to get storage for a research project? You came to the right page. Permanent Link: topicbacklinks

Overview

This service is about storing your data in a structured way. It is best practice to separate different sets of data (e.g. different research projects, you home folder, group share folders) and manage them independently (more information about this concept). You have a home directory storage block by default. Additional storage blocks for personal purposes can be requested via the TicketSystem .

This system's primary use is for research data. However, there are other applications.

Research data is usually split into two storage blocks:
  • p_... for raw-data, results, documents, etc. (data you want to keep around for the duration of the research study). This storage class is better protected (DataProtectionPolicy) but slower.
  • pt_... for doing the computations on. This kind of storage can be bigger, is faster but it's it's protection level is lower (DataProtectionPolicy).
Both storage blocks are not suitable as an archive. When a project is finished, you have to take care of putting all your important data into an Archive.

FAQ

How to access data via a standard institute Linux workstation or compute server?

Permanent Link: topicbacklinks

Type this command to get an overview over storage, which you have read permissions or higher for:

user@host > mydata

Go to the respective folder via cd like this:

user@host > cd /data/p_ftest

The folder is invisible until you access it explicitely (More information).

How to access data in a CentralWindows session?

Permanent Link: topicbacklinks

Use drive X: . There's a list of links in it representing all the storageblocks in the StorageUnified system. You'll only see storage blocks, you have access to.

Why do I have no access although I have admin permissions?

Permanent Link: topicbacklinks

There are people in the institute who do have administrative power over a lot storage blocks (e.g. the data protection officer). To prevent unnecessary Accidents, the permissions model is like this:
  • "admin permission" are only admin permissions. They enable a person to grant other permissions to people/groups - even to himself.
  • "read permission" allow you to read data from a storage block.
  • "writer permissions" allow you to read and write data from/to a storage block.
If you do have administrative permissions and you want to read or write data, go to <https://userportal.cbs.mpg.de/storageunified/manage_permissions> and grant the to you.

How to access data on MacOSX or via registered institute laptops?

Permanent Link: topicbacklinks

A Gateway service ( FileGatewayUser ) provides access via SFTP to all the protected storage of the institute.

How to I get permissions to access a certain storage block?

Permanent Link: topicbacklinks

Permissions can be requested only by the owner of some data or by a person appointed by the owner. All persons that can grant permissions are marked with an a or an A in the permissions -line of the Storageblock overview. Please ask one of them to grant permissions to you.

Keep in mind:
  • It's not sufficient to forward an email in which someone allows you to access some data.
  • One of the persons having administrative permissions has to contact It directly.
  • Currently there's no self-service for permission changes. IT has to be involved to grant/revoke permissions. This will change in the future.

Why the funny names?

Permanent Link: topicbacklinks

Usually you don't need to worry about the storage block names. Just remember, bookmark or write down the name and use the respective path in a file manager. Since the names are hard to remember, you might find this tool useful which shows all storage blocks, you have access to:

user@host > mydata

However, a storage block name tells you a lot about the mechanisms working behind the curtain. The name consists of these components:

Flags

Each letter before the first _ has a special meaning:
  • d - Auto-delete. Data here is subject to some form of automatic deletion. The specifics depend on the storage block. It might be storage for temporary data or the storage block is marked with a deadline ( -timeoutYYYYMMDD ).
  • e - Accessible from the internet. Such a storage block is physically accessible from the outside (although still subject to access control) while all other storage blocks are physically on servers that cannot be accessed from the internet.
  • g - Per-group-storage - only accessible for a single group. The storage block name contains the name of a user group. There's a 1:1 relation (without exceptions!) between group members and users having write permissing there.
  • h - High value data. This storage block is assigned data protection Level 3 DataProtectionPolicy.
  • p - Flexible permission management is in place for this storage block- usually for reasearch projects / studies. Any user of the institute can be assigned readonly-, write- or administrative permissions.
  • t - Temporary data. This storage block's data protection level is 1 DataProtectionPolicy (protection against disk failures only)
  • u - Per-user storage - only accessible for a single user. No exceptions are possible.
No t or h in the flags is a guarantee that the contained data is assigned data protection Level 2 DataProtectionPolicy.

Association

Each u and each g storage block is a associated to a "U"ser or a "G"roup. The respective name is part of the storage block name. These kinds of storage blocks will go away as soon as the respective user leaves the institute (which means "his/her contract ends) or the respective group is dissolved.

Most storage blocks in the institute are associated to research studies. To be precise: They are associated to a specific study ID (5 digit number). There are usually two storage blocks per study ID - one to do computations on which is faster ( pt_ ) and one for storing results ( p_ ) which has better protection against data loss.

Purpose (optional)

The purpose is usually a short word describing, what the storage block is being used for. Some possible purposes are:
  • cloud - Storage for cloud synchronisation
  • share - Per group shared storage area
  • software - Custom software or container images
  • (no purpose on a user storage block) - A linux homedirectory for storing settings and application program profiles (Browser, Mail client, ...)
  • (no purpose on a study storage block ) - This is the default for study storage blocks.

How do I get information about a storageblock?

Permanent Link: topicbacklinks

Storageblock metadata are considered institute wide public knowledge. Use this command to get information about a storage block:

user@host > cbsdata -s [storageblock]

Example:
user@host:~> cbsdata -s pt_ftest
G3-Storageblock 'pt_ftest'
 type        [p/fast] Fast storage with fine-grained permissions (*without*
             backup)
 protection  Level 1 (Help: http://topic.cbs.mpg.de/dpp#1 )
 storage     free:4928 kiB, used:191 kiB, files:7
 permissions test-functional-2(Aw)
 path[linux] /data/pt_ftest
 paths[misc] X:\pt_ftest sftp://filegateway.cbs.mpg.de:/data/pt_ftest
 description Storageblock for functional testing (cbstool -t)
 help        Help on the 'pt'-prefix: cbsdata --list-flags pt_ftest
             Find more help here: http://topic.cbs.mpg.de/storageunified

Hints:
  • "permissions" shows all users having access to the SB incl. their permissions in brackets. Possible permissions are:
    • r Read (i.e. copy data from there or just read files that are stored there)
    • R This flag only appears on "readonly"-storageblocks. It means: Write permission but since the storageblock is marked "readonly", only read access is possible now.
    • w Write (i.e. Change data or copy files to the storage block)
    • a Administrative permissions. This user is allowed to change permissions.
    • A Administrative permissions plus the user is IT's contact person for this storage block.
  • If you require permission to access a storage block? Contact a user with 'a' or 'A' permissions.
  • If you ware a user with admin permissions for a given storage block and you want to change someone's permission, go to https://userportal.cbs.mpg.de/storageunified/manage_permissions .
    *Warning:* Using chown/chmod/... won't work. These changes are not permanent.

How do I know the available storage?

Permanent Link: topicbacklinks

Linux and Windows programs will report the free space with a bias of 10GiByte (i.e. You have 10GiB less space than reported). IT apologizes for this inconvenience. This 10GiB offset causes a tremendous increase in performance (orders of magnitude) of the file service which is why IT decided to implement it (detailled explanation in german).

The storage management tool cbsdata and the tool Q will show correct values which are updated every full hour. Here's an example output in Linux:

burk2@oxygen:~ > cbsdata -s tu_burk2_cloud
G3-Storageblock 'tu_burk2_cloud'
 type        [user/cloud] Per-user local cloud cache storage (*without* backup)
 protection  Level 1 (Help: http://topic.cbs.mpg.de/dpp#1 )
 storage     free:36349 MiB, used:14750 MiB, files:1543
 permissions burk2(Aw)
 path[linux] /data/tu_burk2_cloud
 paths[misc] X:\tu_burk2_cloud
             sftp://filegateway.cbs.mpg.de:/data/tu_burk2_cloud
 description Cloud storage of burk2
 help        Help on the 'tu'-prefix: cbsdata --list-flags tu_burk2_cloud
             Find more help here: http://topic.cbs.mpg.de/storageunified

someuser@somehost:~ > Q
Dr. Some User <someuser@cbs.mpg.de>
 Home folder   3107 Mi/9765 Mi (31.83%)
 Email mailbox 460 /1024 Mi (0.00%)
I: Values are updated once per hour (at 01')
I: Q is just about key storage locations. Use this command to get information
I: about any /data- storage block:  cbsdata -s [storage_block_id]

There's permission trouble when copying files from my home directory to another /data folder. What up?

Permanent Link: topicbacklinks

The home folder has slightly different permissions applied on a technical level than other storage blocks. To be precise: The ACL mask is 0 and not 7 (a.k.k. rwx ). Problems will occur e.g. you download things to your home directory and move/copy them elsewhere a week later via a "smart" linux file manager that tries to copy permission information together with respective files.

Solutions:
  • Do not store date in your home directory. See also: <https://topic.cbs.mpg.de/details#smallhome> .
  • Set group permissions to 7 for all files and folders you copy from your home directory by ...
    • using the command chmod -R g+rwx [folder] or
    • By using a file manager and applying all possible permissions to the "group".
  • Wait for a week. Once a weekend all permissions are corrected automatically.

Other applications

The purpose of this service is to manage all kind of user data. The is particularly important to keep research data from non-secured disks but it's not restricted to that. Other applications include:
  • Cloud storage (schema: tu_username_cloud )
  • Dissertations, master theses (schema: tu_username_thesis )
  • Custom software for research (schema: tu_username_software for personal use or p_gr_groupname_software for group wide use )
  • Home directories (schema: hu_username )
  • Research group associated sharing storage areas ( e.g. gh_gr_somegroup_share )

This topic: EDV/FuerUser > WebHome > StorageUnified
Topic revision: 30 Aug 2024, Burk2
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback