Table of contents
  1. Check the volume of folder in Linux
  2. nohup
  3. Backup in Linux
  4. Backup using bash script
  5. Backup MongoDB
  6. Periodically delete folders or files using crontab
  7. Customizing your terminal using OhMyZsh
  8. vim 配置
  9. How to add file header using vim
  10. 如何在Linux系统中管理用户并赋予相应的权限
  11. Manage a group and user in linux
  12. mkdir
  13. tar
  14. find
  15. awk 的使用
  16. download multiple illumina project data through xargs
  17. How to find all files containing specific text (string) on Linux
  18. What is the difference between the Bash operators [[ vs [ vs ( vs ((?
  19. How to parse bash script arguments

Check the volume of folder in Linux

du -sh folderName

-h. Shows sizes in a human-readable format.
-s. Summarizes the total for each argument.
-a. Includes files as well as directories.

e.g

(base)du -sh streamlit-combine  
12M     streamlit-combine

nohup

If you still want to keep the processes running even exiting the shell/terminal, then nohup is your choice.

Nohup, short for no hang up is a command in Linux systems that keep processes running even after exiting the shell or terminal

E.g. when you upload illumina whole run locally using bs command line:

nohup $bs upload run -n reupload-240312At0317 –t NovaSeq6000 ./240312_A01905_0076_AH7KH7DSX > ~/projects/illumina/UploadRun.out

Note:

  1. -n : refer to the name you’d like to show up in basespace
  2. -t : refer to the machine that you used for the sequence
  3. ./240312_A01905_0076_AH7KH7DSX : specify the wohle run folder in the machine
  4. ~/projects/illumina/UploadRun.out : redirect the output to another file in case you need to check the log.

Backup in Linux

If you build a database and you’d like to backup the database periodically, then you can use rsync and crontab

  1. You should use the user account that has the sudo permission
  2. Test rsync : rsync -aAXv --delete --dry-run --exclude=/dev/* --exclude=/proc/* --exclude=/sys/* --exclude=/tmp/* --exclude=/run/* --exclude=/mnt/* --exclude=/media/* --exclude="swapfile" --exclude="lost+found" --exclude=".cache" --exclude="Downloads" --exclude=".VirtualBoxVMs"--exclude=".ecryptfs" / /run/media/alu/ALU/ . The --exclude can recognize regex pattern, --delete means if you delete some files from the source, then this file will also be deleted from the target. --dry-run: just test not run in real case, you can use it test whether the files are what you want.
  3. Enter command : sudo crontab -e, it will use the root permission to run the command, so you don’t need to add sudo at the head of the command line.
  4. Write down the rsync command based on the creteria to write down the command that you’d like to run periodically.
  5. Save it.

If you’d like to run the command every minutes, then you need to write the time like : * * * * *

Sometimes, you do not have the sudo permission to use rsync command. Here’s a solution to do that on the other way.

  1. make the folder that you’d like the sudo user have the write permission: setfacl -R -m u:username:rwx myfolder
  2. update the crontab file in the sudo user enviroment: crontab -e

The safe way to store you data is to backup your data using crontab and then using github to save your source code.

Backup using bash script

Sometimes, the folder you’d like to backup is not permitted to cooy or you can just copy the structre of the folders. And there are not files in the folder. then try the following methods

  1. Create a bash script file and write down the backup command rsync. e.g: rsync -av --delete --exclude="__pycache__" --exclude=".pytest_cache" --exclude="node_modules" --exclude=".git" --exclude='temp' --exclude='tests' /home/pathToSourceFolder /pathToTargetFolder
  2. Open crontab file with sudo: sudo crontab -e
  3. Write the following command to backup data periodically
# Backup  projects everyday at 6:00PM
* * * * * sh /pathToBashFolder/backupProjectsAt6PM.sh

Example

/pathToBashFolder/backupProjectsAt6PM.sh

rsync -av --delete --exclude="__pycache__" --exclude=".pytest_cache" --exclude="node_modules" --exclude=".git" --exclude='temp' --exclude='tests'  /home/pathToSourceFolder  /pathToTargetFolder

# Here you can edit this file and add more backup projects

crontab file

# Backup  projects everyday at 6:00PM
* * * * * sh /pathToBashFolder/backupProjectsAt6PM.sh

Note

A trailing slash on the source changes this behavior to avoid creating an additional directory level at the destination. You can think of a trailing / on a source as meaning “copy the contents of this directory” as opposed to “copy the directory by name”, but in both cases the attributes of the containing directory are transferred to the containing directory on the destination. In other words, each of the follow‐ ing commands copies the files in the same way, including their setting of the attributes of /dest/foo:

  rsync -av /src/foo /dest
  rsync -av /src/foo/ /dest/foo

Backup MongoDB

In crontab

3 3 * * * mongodump --out /var/backups/mongobackups/$(date +'%Y-%m-%d')

In order to save only the last 7 days data to save memory

1 3 * * * find /var/backups/mongobackups/ -mtime +7 -exec rm -rf {} \;

Periodically delete folders or files using crontab

e.g.


# Delete target folders weekly on Friday at 1:00AM that older than 60 days
0 1 * * 5  find /mnt/target/ -maxdepth 1 -mindepth 1 -type d -mtime +60 -exec rm -rf {} \;

Here you should be careful and use -maxdepth 1 -mindepth 1 to limit the folders only in the target folder but not include the target folder. Otherwise you’ll delete the parent folders. And -maxdepth and -mindepth should be placed at the front most options.

Note:

[!IMPORTANT] -maxdepth 1 : means the find can go to 1 level. 0 level means end at target, 1 means end in target folders

-mindepth 1 : means the find command start from level 1. 0 levle means starts from target, 1 means starts from folders in the target

So -maxdepth 1 -mindepth 1 only limits the folders in the target but except the target folder

-type d : means only limit folders. If you’d like to limit files, replace d with f. Check other types by using man find

-mtime +60 : measn folders older than 60 days

-exec rm -rf {} \; : delete the folders that satisfy the above files. you must place \; at end to end the execution command.

Customizing your terminal using OhMyZsh

此链接解释怎么在terminal 配置ohmyzsh shell.

如果是在Ubuntu中进行修改则需要注意以下几点:

  1. 安装OhMyZsh,在ternimal输入 : $ sh -c "$(wget https://raw.github.com/ohmyzsh/ohmyzsh/master/tools/install.sh -O -)"
  2. 更改 zsh-syntax-highlighting的背景颜色,不然在Dark solarized的背景下看不清:在 ~/.zshrc中添加 ZSH_AUTOSUGGEST_HIGHLIGHT_STYLE="fg=yellow"

vim 配置

Another web source that shows the details of .vimrc 我的vim配置如下:create a .vimrc in the home directory if there’s no such file.

set number
set nocompatible

" hight the syntax
syntax on

set showmode
set autoindent
set smartindent
set expandtab
set tabstop=4
set shiftwidth=4
set softtabstop=4

set showmatch
set hlsearch
set incsearch
set ignorecase

" 自动补全括号
inoremap ' ''<ESC>i
inoremap " ""<ESC>i
inoremap ( ()<ESC>i
inoremap [ []<ESC>i

How to add file header using vim

Using space to align the row instead of tab, otherwise it cannot search the pattern correctly.

" Add file header
au bufnewfile *.sh 0r ~/.vim/sh_header.temp
au bufnewfile *.py 0r ~/.vim/py_header.temp
autocmd bufnewfile *.sh,*.py exe "1," . 10 . "g/Script Name    :.*/s//Script Name    :".expand("%")
autocmd bufnewfile *.sh,*.py exe "1," . 10 . "g/Creation Date  :.*/s//Creation Date  :".strftime("%Y-%m-%d")
autocmd Bufwritepre,filewritepre  *.sh,*.py exe "1," . 11 . "g/Last Modified  :.*/s//Last Modified  :".strftime("%c")

如何在Linux系统中管理用户并赋予相应的权限

如果我们写了一些程序想让别人运行,但是并不像让他们对文件进行操作,例如查看,修改。只是让他们运行文件而已。

  1. Create a “Group”. If you use your account, then when you created your account, a group would also be created with the same name as your username. e.g “appUser”
  2. Creat a new user adduser --ingroup appUser testUser
  3. Use ls -alh to show the status of the file. e.g -rwxr-xr-x 1 user group 21K Oct 21 14:54 utils.py. Here -rwxr-xr-x:
    • if this is a dictionary, then the first letter will be d
    • The following three letters(rwx) means the permission of the owner: r(read - 4), w(write - 2), x(executable - 1)
    • The follwoing three letters(r-x) means the permission of the same group has only read and executable permission
    • The following three letters(r-x) means the permission of other users who don’t belong to the group will have the permission : read and executable
  4. Change the permission of the folders and files : chmod 711 folder OR chmod 755 scriptFile
    • if the folder has the permission ‘x’ for group, then the group users could use cd to go into that folder
    • if the folder has the grouup has ‘r’ permission, then the group users could use ls to show the content in that folder]
    • if you’d like the user use some script, then you need to give the parent folders at least ‘x’ permission to the group. The parent folders should possess the same permission as the child folders at least.
    • e.g. drwxr-xr-x : the group users could use cd and ls
    • e.g. drwx--xr-x : the group users could only use cd, cannot use ls
  5. put the executable scripts in an separate folder.
  6. Do some test to confirm your process is ok

Manage a group and user in linux

  • show groups : cat /etc/group
  • show users : cat /etc/passwd
  • add a group : sudo groupadd appUser
  • delete a group : sudo groupdel appUser
  • display who is a member of a group : getent group group-name
  • add a group to a user : sudo usermod -a -G appUser testuser. You need to login in a user that has the sudo permission.
  • create a new user to a group : sudo useradd -g primary_group -G another_group -s /usr/bin/bash -m new_user. Here -m means create a homde directory for new user; -s means using specific shell.
  • change passwd for new user: sudo passwd new_user
  • delete a user : userdel username; userdel -r username. Use the -r (--remove) option to force userdel to remove the user’s home directory and mail spool
  • Create a new user in Linux : sudo useradd -s /usr/bin/bash -m new_user

mkdir

  1. mkdir -p d1/d2/d3 一次生成多个folder,即使d2,d3不存在,同样可以建立.

tar

  1. tar -xf v5.24.1.tar.gz --strip 1 "*/data" "*/environment.yaml"

-x 表示解压, -f 表示文件,--strip 1表示不重建前1层目录。这一句表示从v5.24.1.tar.gz 中提取出data, environment.yaml文件,详细的关于tar的解释可见What does –strip-components -C mean in tar?

-c表示Create a new archive containing the specified items.,例如使用tar压缩文件时,则可以使用tar -cf archive.tar a/b/c/FILEa/b/c/FILE 为想要压缩的文件,archive.tar为压缩后的文件名。

列出tar文件中的内容,可以使用tar -tf archive.tar

  • tar -xf v5.24.1.tar.gz : 解压文件
  • tar -cf archive.tar a/b/c/FILE : 压缩文件
  • tar -tf archive.tar : 列出tar中的文件

find

找到满足一定要求的文件

find 文件路径 -type f -name 'pattern'
  • find 后面的第一个参数是文件路径,想从哪一个文件夹中查找文件
  • -type: 想要查找的类型, f: file ; d: directory
  • -name: 想要查找的名称,使用引号括起来。例如想查找python 文件,以 py 结尾的文件。则可以使用”*.py”

如果后缀中有大小写,那么可以使用-iname,表示case-insensitive。

find 文件路径 -type f -iname '*.jpg'

如果想查找多个后缀名称的文件,可以使用or 把多个要查找的文件pattern结合在一起。如下:

find 文件路径 -type f \( -name '*py' -o -name '*ipynb' \)

要把多个条件放在括号中。其中前后两端要留有空格,使用反斜杠进行标记。

  • -size: 查找的文件打下
k       kilobytes (1024 bytes)
M       megabytes (1024 kilobytes)
G       gigabytes (1024 megabytes)
T       terabytes (1024 gigabytes)
P       petabytes (1024 terabytes)

例如查找当前文件夹下文件大小超过1M的文件

find . -size +1M

find . -size -1M

+1M中的+号表明文件大于1M。如果是想查看小于1M的则可以使用- 号。

awk 的使用

awk is a command to process the output of other command, and then extract the interested content.

cat failed_vcf.txt| awk -F '/' '$1 ~ /09ab/ {print $NF}' | xargs -d '\n' -I{} touch {}
  1. -F define the delimer, here is /
  2. $NF means the last collumn
  3. $1 ~ /09ab/ : regex that the first collum need to contain 09ab

download multiple illumina project data through xargs

cat download-list.txt | xargs -d '\n'  -I{} /home/user/Softwares/linux-amd64/icav2 projectdata download {}
  1. download-list.txt contains the folder Id or file ID that you can find in the illumina ICA
  2. xargs can be used to redirect the arguments to other command. Here -d define the delimer; -I{} define a variable named {} that can be used in other command
  3. /home/user/Softwares/linux-amd64/icav2: specify the absolute path to the command, if you just use the icav2, it cannot work. Even though you use alias and operate the command in bash terminal.

How to find all files containing specific text (string) on Linux

grep -rnw '/path/to/somewhere/' -e 'pattern'
  1. -r or -R is recursive,
  2. -n is line number, it will show the line number in the output
  3. -w stands for match the whole word. if no such parameter, then it may find the pattern within one word. For example, we want to search ‘name’ in all files. If we use -w, then only the world name satisfy the requirement. Without this parameter, hello_name also satisfy the requierement.
  4. -l (lower-case L) can be added to just give the file name of matching files. Don’t output the details, just give you the file name that match the pattern
  5. -L (upper-case L) can be added to just give the file name of without matching files. The opposite output files name of -l
  6. -i : ignore case distingctions in patterns.
  7. -e is the pattern used during the search
grep -rnwil './' -e 'id'

What is the difference between the Bash operators [[ vs [ vs ( vs ((?

How to parse bash script arguments

  • use getopt
  • Resouce 2
  • the colon (:) to indicate that an argument expects a value, like "-d=river" rather than a simple "-d."
#!/usr/bin/bash
MAG="\e[35m"
RED="31"
GREEN="32"
BOLDGREEN="\e[1;${GREEN}m"
ITALICRED="\e[3;${RED}m"
ERROR="\e[4;3;1;${RED}m"
ENDCOLOR="\e[0m"

WORKFLOW="/scripts/workflow_autoReport.py"
environment='docx'

help()
{
    echo -e "Usage: Render Report from template using tsv format OKR, VCF file and comment database 
            [-h, --help] Show this help message and exit
            [-i, --path_catalog] A csv foramt file that contains the path of VCF and OKR, which has two columns: path_tsv_OKR, path_vcf
            [-t , --name_template] The template Name.
            [-d , --path_database] The path of comment database.
            "
    exit 2
}

SHORT=i:,t:,d:,h
LONG=path_catalog,name_template,path_database,help
OPTS=$(getopt -a -n render_template --options $SHORT --longoptions $LONG -- "$@")

VALID_ARGUMENTS=$# # Returns the count of arguments that are in short or long options

if [ "$VALID_ARGUMENTS" -eq 0 ]; then
  help
fi

eval set -- "$OPTS" 

while :
do
  case "$1" in
    -i | --path_catalog )
      path_catalog="$2"
      shift 2
      ;;
    -t | --name_template )
      name_template="$2"
      shift 2
      ;;
    -d | --path_database)
      path_database="$2"
      shift 2
      ;;
    -h | --help)
      help
      ;;
    --)
      shift;
      break
      ;;
    *)
      echo -e "Unexpected option: $1"
      help
      ;;
  esac
done


[ -z "$path_catalog" ] && echo -e "${ERROR}Please provide the catlog file using the option -i${ENDCOLOR}" && help
[ -z "$name_template" ] && echo -e "${ERROR}Please provide the template name using the option -t${ENDCOLOR}" && help
[ -z "$path_database" ] && echo -e "${ERROR}Please provide the comment database path using the option -d${ENDCOLOR}" && help

echo "path_catalog is : ${path_catalog}, name_template is : $name_template, path_database is : $path_database"

#! go to the folder 

$WORKFLOW -i "$path_catalog" -t "$name_template" -d "$path_database"