I'm looking to make a list of toneless pinyin combinations/permutations.
import pandas as pd
data = pd.read_csv('chinese_tones.txt', sep=" ", header=None)
data.columns = ["pinyin", "character"]
data['pinyin'] = data['pinyin'].str.replace('d+', '')
The current format of the data is:
| pinyin| character|
|------|----|---|---|---|
| cang | 仓 | | | |
| cang | 藏 | | | |
| cao | 操 | | | |
| cao | 曹 | | | |
| cao | 草 | | | |
The expected result would be a list like:
cangcang
cangcao
caocang
caocao
I can dedupe the list and clean myself. I'm just trying to include every combination in every order of two pinyin.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…