Block-based storage concept of the MPI CBS
Summary: You need to get storage for a research project? You came to the right page.
Permanent Link:
Overview
This service is about storing your data in a structured way. It is best practice to separate different sets of data (e.g. different research projects, you home folder, group share folders) and manage them independently (
more information about this concept). You have a home directory storage block by default. Additional storage blocks for personal purposes can be requested via the
TicketSystem .
This system's primary use is for
research data. However, there are
other applications.
Research data is usually split into two storage blocks:
-
p_...
for raw-data, results, documents, etc. (data you want to keep around for the duration of the research study). This storage class is better protected (DataProtectionPolicy) but slower.
-
pt_...
for doing the computations on. This kind of storage can be bigger, is faster but it's it's protection level is lower (DataProtectionPolicy).
Both storage blocks are
not suitable as an archive. When a project is finished,
you have to take care of putting all your important data into an
Archive.
FAQ
How to access data via a standard institute Linux workstation or compute server?
Permanent Link:
Type this command to get an overview over storage, which you have read permissions or higher for:
user@host >
mydata
Go to the respective folder via
cd
like this:
user@host >
cd /data/p_ftest
The folder is invisible until you access it explicitely (
More information).
How to access data in a CentralWindows session?
Permanent Link:
Use drive
X:
. There's a list of links in it representing all the storageblocks in the
StorageUnified system. You'll only see storage blocks, you have access to.
Why do I have no access although I have admin permissions?
Permanent Link:
There are people in the institute who do have administrative power over a lot storage blocks (e.g. the data protection officer). To prevent unnecessary Accidents, the permissions model is like this:
- "admin permission" are only admin permissions. They enable a person to grant other permissions to people/groups - even to himself.
- "read permission" allow you to read data from a storage block.
- "writer permissions" allow you to read and write data from/to a storage block.
If you do have administrative permissions and you want to read or write data, go to <https://userportal.cbs.mpg.de/storageunified/manage_permissions> and grant the to you.
How to access data on MacOSX or via registered institute laptops?
Permanent Link:
A Gateway service (
FileGatewayUser ) provides access via SFTP to all the protected storage of the institute.
How to I get permissions to access a certain storage block?
Permanent Link:
Permissions can be requested only by the owner of some data or by a person appointed by the owner. All persons that can grant permissions are marked with an
a
or an
A
in the
permissions
-line of the
Storageblock overview.
Please ask one of them to grant permissions to you.
Keep in mind:
- It's not sufficient to forward an email in which someone allows you to access some data.
- One of the persons having administrative permissions has to contact It directly.
- Currently there's no self-service for permission changes. IT has to be involved to grant/revoke permissions. This will change in the future.
Why the funny names?
Permanent Link:
Usually you don't need to worry about the storage block names. Just remember, bookmark or write down the name and use the respective path in a file manager. Since the names are hard to remember, you might find this tool useful which shows all storage blocks, you have access to:
user@host >
mydata
However, a storage block name tells you a lot about the mechanisms working behind the curtain. The name consists of these components:
Flags
Each letter before the first
_
has a special meaning:
-
d
- Auto-delete. Data here is subject to some form of automatic deletion. The specifics depend on the storage block. It might be storage for temporary data or the storage block is marked with a deadline ( -timeoutYYYYMMDD
).
-
e
- Accessible from the internet. Such a storage block is physically accessible from the outside (although still subject to access control) while all other storage blocks are physically on servers that cannot be accessed from the internet.
-
g
- Per-group-storage - only accessible for a single group. The storage block name contains the name of a user group. There's a 1:1 relation (without exceptions!) between group members and users having write permissing there.
-
h
- High value data. This storage block is assigned data protection Level 3 DataProtectionPolicy.
-
p
- Flexible permission management is in place for this storage block- usually for reasearch projects / studies. Any user of the institute can be assigned readonly-, write- or administrative permissions.
-
t
- Temporary data. This storage block's data protection level is 1 DataProtectionPolicy (protection against disk failures only)
-
u
- Per-user storage - only accessible for a single user. No exceptions are possible.
No
t
or
h
in the flags is a guarantee that the contained data is assigned data protection Level 2
DataProtectionPolicy.
Association
Each
u
and each
g
storage block is a associated to a "U"ser or a "G"roup. The respective name is part of the storage block name. These kinds of storage blocks will go away as soon as the respective user leaves the institute (which means "his/her contract ends) or the respective group is dissolved.
Most storage blocks in the institute are associated to research studies. To be precise: They are associated to a specific study ID (5 digit number). There are usually two storage blocks per study ID - one to do computations on which is faster (
pt_
) and one for storing results (
p_
) which has better protection against data loss.
Purpose (optional)
The purpose is usually a short word describing, what the storage block is being used for. Some possible purposes are:
-
cloud
- Storage for cloud synchronisation
-
share
- Per group shared storage area
-
software
- Custom software or container images
- (no purpose on a user storage block) - A linux homedirectory for storing settings and application program profiles (Browser, Mail client, ...)
- (no purpose on a study storage block ) - This is the default for study storage blocks.
Permanent Link:
Storageblock metadata are considered institute wide public knowledge. Use this command to get information about a storage block:
user@host >
cbsdata -s [storageblock]
Example:
user@host:~> cbsdata -s pt_ftest
G3-Storageblock 'pt_ftest'
type [p/fast] Fast storage with fine-grained permissions (*without*
backup)
protection Level 1 (Help: http://topic.cbs.mpg.de/dpp#1 )
storage free:4928 kiB, used:191 kiB, files:7
permissions test-functional-2(Aw)
path[linux] /data/pt_ftest
paths[misc] X:\pt_ftest sftp://filegateway.cbs.mpg.de:/data/pt_ftest
description Storageblock for functional testing (cbstool -t)
help Help on the 'pt'-prefix: cbsdata --list-flags pt_ftest
Find more help here: http://topic.cbs.mpg.de/storageunified
Hints:
- "permissions" shows all users having access to the SB incl. their permissions in brackets. Possible permissions are:
-
r
Read (i.e. copy data from there or just read files that are stored there)
-
R
This flag only appears on "readonly"-storageblocks. It means: Write permission but since the storageblock is marked "readonly", only read access is possible now.
-
w
Write (i.e. Change data or copy files to the storage block)
-
a
Administrative permissions. This user is allowed to change permissions.
-
A
Administrative permissions plus the user is IT's contact person for this storage block.
- If you require permission to access a storage block? Contact a user with 'a' or 'A' permissions.
- If you ware a user with admin permissions for a given storage block and you want to change someone's permission, go to https://userportal.cbs.mpg.de/storageunified/manage_permissions .
*Warning:* Using chown/chmod/... won't work. These changes are not permanent.
How do I know the available storage?
Permanent Link:
Linux and Windows programs will report the free space with a bias of 10GiByte (i.e. You have 10GiB less space than reported). IT apologizes for this inconvenience. This 10GiB offset causes a tremendous increase in performance (orders of magnitude) of the file service which is why IT decided to implement it (
detailled explanation in german).
The storage management tool
cbsdata
and the tool
Q
will show correct values which are updated every full hour. Here's an example output in Linux:
burk2@oxygen:~ > cbsdata -s tu_burk2_cloud
G3-Storageblock 'tu_burk2_cloud'
type [user/cloud] Per-user local cloud cache storage (*without* backup)
protection Level 1 (Help: http://topic.cbs.mpg.de/dpp#1 )
storage free:36349 MiB, used:14750 MiB, files:1543
permissions burk2(Aw)
path[linux] /data/tu_burk2_cloud
paths[misc] X:\tu_burk2_cloud
sftp://filegateway.cbs.mpg.de:/data/tu_burk2_cloud
description Cloud storage of burk2
help Help on the 'tu'-prefix: cbsdata --list-flags tu_burk2_cloud
Find more help here: http://topic.cbs.mpg.de/storageunified
someuser@somehost:~ > Q
Dr. Some User <someuser@cbs.mpg.de>
Home folder 3107 Mi/9765 Mi (31.83%)
Email mailbox 460 /1024 Mi (0.00%)
I: Values are updated once per hour (at 01')
I: Q is just about key storage locations. Use this command to get information
I: about any /data- storage block: cbsdata -s [storage_block_id]
There's permission trouble when copying files from my home directory to another /data folder. What up?
Permanent Link:
The home folder has slightly different permissions applied on a technical level than other storage blocks. To be precise: The ACL mask is 0 and not 7 (a.k.k.
rwx
). Problems will occur e.g. you download things to your home directory and move/copy them elsewhere a week later via a "smart" linux file manager that tries to copy permission information together with respective files.
Solutions:
- Do not store date in your home directory. See also: <https://topic.cbs.mpg.de/details#smallhome> .
- Set group permissions to 7 for all files and folders you copy from your home directory by ...
- using the command
chmod -R g+rwx [folder]
or
- By using a file manager and applying all possible permissions to the "group".
- Wait for a week. Once a weekend all permissions are corrected automatically.
Other applications
The purpose of this service is to manage all kind of user data. The is particularly important to keep research data from non-secured disks but it's not restricted to that. Other applications include:
- Cloud storage (schema:
tu_username_cloud
)
- Dissertations, master theses (schema:
tu_username_thesis
)
- Custom software for research (schema:
tu_username_software
for personal use or p_gr_groupname_software
for group wide use )
- Home directories (schema:
hu_username
)
- Research group associated sharing storage areas ( e.g.
gh_gr_somegroup_share
)