Table of contents
  1. decorator
  2. Create temparory file and folder
  3. Structure the project folder
  4. install packages
    1. conda
  5. Python
    1. 使用dateutil 对string转换成datetime
    2. 使用yml格式作为config 文件格式
    3. use shutil to copy files
    4. use pipreqs to extract the packages used in the script
    5. use glob to search specific files in specific depth
    6. use * to unpack the list
    7. format string
    8. print space
    9. set
    10. defaultdict
    11. bisect
    12. sort OR sorted
    13. range
    14. ord
    15. heapq
    16. deque
    17. one line equation
  6. re
    1. re 的使用
    2. use the seperator semicolon but the semicolon not in Square brackets
    3. use the seperator semicolon but the semicolon not in parenthese
    4. 匹配多种可能性
  7. openpyxl
    1. 给excel 中的cell填充颜色

decorator

Here’s is a good boilerplate template for building more complex decorators from RealPython

import functools

def decorator(func):
    @functools.wraps(func)
    def wrapper_decorator(*args, **kwargs):
        # Do something before
        value = func(*args, **kwargs)
        # Do something after
        return value
    return wrapper_decorator

Here’s a real example ellustrated in realPython

import functools
import time

# ...

def timer(func):
    """Print the runtime of the decorated function"""
    @functools.wraps(func)
    def wrapper_timer(*args, **kwargs):
        start_time = time.perf_counter()
        value = func(*args, **kwargs)
        end_time = time.perf_counter()
        run_time = end_time - start_time
        print(f"Finished {func.__name__}() in {run_time:.4f} secs")
        return value
    return wrapper_timer

Create temparory file and folder

When you upload file to Flask server, you’d like to save the file to a temporary folder and delete it when finished using it. And you need to use tempfile. The advantage of using tempfile is that the temporary files and folders can be deleted by themselves after the process is done.

import tempfile
path_temp = 'temp' # specify a customed tempory folder that you'd like to use
path_tempCaseFolder = tempfile.TemporaryDirectory(dir=path_temp) # create a temporary folder in the customed folder
path_okr_DNA = tempfile.NamedTemporaryFile(suffix='.txt',dir=path_tempCaseFolder.name).name # generate a txt file in the temporary folder. note that you need to use name to get the path of the temporary file and folder

Structure the project folder

If you have a complicated structrue for a project and you use relative package in the submodule, then you can run the python file in a submodule. It give you the error like ImportError: attempted relative import with no known parent package. That’s because this file don’t know the current package. If you write the code in the file print(__package__), it will return None. So it don’t known the parent package. But if you want to run the submodule in a script that locates in a upper level, then you have to use the relative package mode to let the submodule know where it locates. If you ran it on a upper level. Then the package will be the folder name that the submodule locates.

In order to solve this problem, you’d better use pathlib and sys to locate the package to the project folder not the submodule.

import sys
import pathlib

sys.path.insert(0,str(pathlib.Path(__file__).resolve().parent.parent)) # you need to specify how many parent you should use to locate to the project folder

At last, you can use the submodule independent or in an upper level script file.

install packages

conda

conda install package_name=version

conda install 'package_name>= version'

conda install 'package_name<= version'
If any of these characters, ‘>’, ‘<’, ‘ ’ or ‘*’, are used, a single or double quotes must be used

Python

使用dateutil 对string转换成datetime

from dateutil import parser
import pandas as pd

s = '7/1/2021 6:29'
parser.parse(s)
# output: datetime.datetime(2021, 7, 1, 6, 29)

parser.parse(s).strftime('%m/%d/%Y')
# output: '07/01/2021'

# calculate the TAT. the endDate and startDate are datetime style
tat = round((endDate-startDate)/pd.Timedelta(hours=24), 1)

Now you can use the result get what kind of format you’d like to convert. This package is easy and friendly. It just convert the string to datetime directly.

使用yml格式作为config 文件格式

ymal : “Yet Another Makeup Language”

  1. install: pip install PyYAML
  2. import : import yaml
  3. load : with open('config.yml','r') as f: config = ymal.safe_load(f)
  4. dump : ‘with open(‘config.yml’,’w’) as f: config = ymal.safe_dump(f)’

use shutil to copy files

When I use VSCode to link the remote server and run code in VSCode. The code shutil.copy(path_src,path_dest) will show errors that Permission Deny. Anyway you can’t copy files to some directory. 如果换了文件夹,则可以成功复制。后面使用shutil.copyfile(file_src,file_dest)则没有问题。

use pipreqs to extract the packages used in the script

pip3 install pipreqs

pipreqs ./your_script_directory

use glob to search specific files in specific depth

If you don’t care about the depth of the dictionary, then you can use pathlib.Path.rglob(pattern) to search the files. But the problem is that when the file or path has a long name, this method will generate error. Another method is to use glob to search the files with a specific depth in the dictionary. Show as below.

from pathlib import Path
import os,glob

pattern_file = r'Chip Loading*.xlsm'
path_folder = Path(/mnt/hgsf/test)
list_desiredFiles = path_folder.rglob(pattern_file)

# another method
#zero depth
list_desiredFiles_d0 = glob.glob(os.path.join(path_folder,pattern))
list_desiredFiles_d1 = glob.glob(os.path.join(path_folder,'*',pattern))
list_desiredFiles_d2 = glob.glob(os.path.join(path_folder,'*','*' ,pattern)) # how many star * means how many depths of the dictionary

use * to unpack the list


for a in zip([[1,2,3],[4,5,6]]):
    print(a)

# output 
#([1, 2, 3],)
#([4, 5, 6],)

for a in zip(*[[1,2,3],[4,5,6]]): # add * to unpack a list
    print(a)
    
# output
#(1, 4)
#(2, 5)
#(3, 6)


format string

参考realpython

score = 0.7777777
print('The score is %.3f'%score) # output : The score is 0.778

for f in range(X_train.shape[1]):
    print("%2d) %-*s %f"% (f+1, 30,
                           feat_labels[indices[f]],
                           importances[indices[f]]))

以上代码中,

  • %2d) 对应的是1),
  • %-* 对应的是30个space,这里的*对应30,前面的负号(-)表示是在文字后使用空格填补30个字符,如果不加负号则是在文字前面填补空格到30个字符。
  • s 对应的是feat_labels[indices[f]]
  • %f 对应的是importances[indices[f]]

string format with space

以下是string 没有添加空格后的效果

string format with space

set

defaultdict

from collections import defaultdict
dic_count = defaultdict(int) # 用于计数, default value is 0
dic_list = defaultdict(list) # 用于储存数组, default value is [],a empty list
dic_labmda = defaultdict(lambda : 'ok')
 # default value is the return value in the lambda, here is 'ok', 
# you can also set other values such as empty string, float('inf')

两个defaultdict在一起使用需要使用lambda,例如: table_food = defaultdict( lambda : defaultdict(int) ),如果不使用lambda,defaultdict(defaultdict(int))是会报错的。例如1418. 点菜展示表


bisect

bisect.bisect_left(stack,idx):通过2分法排序查找idx 在stack中位置(从左边开始),若bisect.bisect_right(stack,idx)则是从右边开始数。

import bisect

a = [1,2,3,4,5]

bisect.bisect_left(a,3)
# output : 2

bisect.bisect_right(a,3)
# output : 3

sort OR sorted

a = ['ok','as','apple','pear','desk']
b = sorted([(word,len(word)) for word in a], key=lambda x: (-x[1],x[0]))
 # output of b :[('apple', 5), ('desk', 4), ('pear', 4), ('as', 2), ('ok', 2)]

Note:通过 key 对关键字进行排序,首先是单词中字符的数量x[1],之所以加- 是因为想让字符数量从大到小排列,而相同字符数量的单词,则是按照单词在单词字典中的顺序排列。例如abe 要排在 abm 之前。

range

for i in range(3,0,-1):
    print(i)
    # output : 3,2,1

for i in range(0,10,2):
    print(i)
    #output : 0,2,4,6,8

ord

  • ord("A") -> 65 :字符转化为ASCII码(十进制)
  • chr(65) -> 'A': ASCII 转化为字符

查看ASCII Table

ASCII table

heapq

堆堆数组进行heapify后可以保证数组index=0为最小的值,因此python自带的heapq属于小顶堆。每次heappop, heappush 都会改变数组中数值的排列。

如果想要大顶堆,可以在每个数组中的数前加上-号,提取出来时再加上-即可。

import heapq

ls = list([1, 3, 5, 7, 9, 2, 4, 6, 8, 0])

heapq.heapify(ls) # 堆化,此时ls的element的顺序发生变化,排在ls[0] 是ls中最小的一个数
ls 
 # output : [0, 1, 2, 6, 3, 5, 4, 7, 8, 9]

heapq.heappop(ls) # 弹出最小的数,此时ls中的数据也发生变化
ls
 # output: [1, 3, 2, 6, 9, 5, 4, 7, 8]

heapq.heappush(ls,10) # 往堆中加入数据
ls
 # output: [1, 3, 2, 6, 9, 5, 4, 7, 8, 10]

deque

python的deque 函数可以行使queue的功能:first in,first out。并且deque可以在两端进行插入数组和排除数据。

from collections import deque

q = deque([1,2,3,4,5])

q.append(0) # 默认增加在数组最右端
q
 # output:deque([1, 2, 3, 4, 5, 0])

q.appendleft(9) # 从左端加入
q
 # output: deque([9, 1, 2, 3, 4, 5, 0])

q.pop() # 默认从右端排出
 # output : 0

q.popleft() # 从左端排出
 # output : 9

one line equation

  • 如果只有一个if,可以这样写[x for x in range(10) if x%2 == 0]
  • 如果既有if又有else,可以这样写[x if x%2==0 else x*10 for x in range(10)]

re

python使用Regular expression operations。需使用re package。

import re
  • 一般我们使用r'pattern'来表示regular expression 中的pattern。其目的是可以防止\带来的困扰。例如想查找string中是否存在\\,则pattern需要这样写:\\\\ or r'\\'。因此使用r 可以让pattern更容易些,也更容易理解,引号里面是什么就是查找什么内容。不需要对特殊字符进行转化。

  • \w: For Unicode (str) patterns, like [a-zA-Z0-9_].表示所有的小写字母,大写字母,0到9的数字和下划线。

re 的使用

re 的使用有两种方式:

  1. 使用compile方式
prog = re.compile(pattern)
result = prog.match(string)

首先使用re.compile(pattern)生成regular expresssion object。然后再使用re的function,like match(), search()

  1. 直接使用re的function。function中要有pattern和string。
result = re.match(pattern, string)

这两种方式的结果是相同的。但是如果多次使用regular expression。则使用re.compile 的效率更高。

use the seperator semicolon but the semicolon not in Square brackets

strung = filter(None, re.split(r';(?!)', string))

(?![^\[\]]*\]) is the negative lookahead to assert that ; is not within [...]. However, this regex does not check if the semicolon is really inside the square brackets, it only matches them before a closing square bracket without checking for the opening one

use the seperator semicolon but the semicolon not in parenthese

re.split(''';(?=(?:[^'"]|'[^']*'|"[^"]*")*$)''', data)

Each time it finds a semicolon, the lookahead scans the entire remaining string, making sure there’s an even number of single-quotes and an even number of double-quotes. (Single-quotes inside double-quoted fields, or vice-versa, are ignored.) If the lookahead succeeds, the semicolon is a delimiter.

If the double-quoted and single quotes are odd, then the semicolon must be in the parenthese. So this semicolon will not be treated as a seperator.

匹配多种可能性

使用()|进行组合匹配多种可能性。如下,reports后面加上?表示s可有可无。(is|are|were|was) 表示这个位置有四种可能性的词。因为使用了()来寻找pattern,因此可以使用output.group(0)表示匹配到的句子或可能的内容。 .*? : 非贪婪匹配。

output = re.search(r'Tax reports? (is|are|were|was) reported .*?(\n|\.)')
output.group(0)

openpyxl

给excel 中的cell填充颜色

from openpyxl.styles import PatternFill

def color_ws_reportDate(ws,column='M'):
    """Color the report date in ws : reported : green , unreported: red
    - ws : worksheet opened by openpyxl
    """
    fillGreen = PatternFill(fill_type='solid',
              start_color='6BCB77',
              end_color='6BCB77')

    fillRed = PatternFill(fill_type='solid',
              start_color='FF6B6B',
              end_color='FF6B6B')
    for cell in ws[column]:
        try:
            value = str(cell.value).strip()
            if value and value != 'nan':
                print(f'value is : {value}')
                cell.fill=fillGreen
            else:
                cell.fill = fillRed
        except:
            message(f'Error when color report date in cell contain: {cell.value}',bcolors.FAIL)