Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
225 views
in Technique[技术] by (71.8m points)

python - How to process this list of oddly structured strings with emoji flags into a dictionary?

Imagine we have a list of strings with defined structure. What would be the simplest strategy to parse such a list to get a dictionary?

mylist = [
    '????Zynex 0,6',
    '????PayPal 11',
    '????PetIQ 0,5',
    '????First Solar 0,7',
    '????Upwork 1%',
    '????NV5 Global 0,8',
    '????TPI Composites 1',
    '????Fiserv 0,5',
]

And I'm looking to get the result:

{
    'Zynex': 0.6,
    'PayPal': 11.0,
    'PetIQ': 0.5,
    'First Solar': 0.7,
    'Upwork': 1.0,
    'NV5 Global': 0.8,
    'TPI Composites': 1.0,
    'Fiserv': 0.5,
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

It's actually quite simple:

import re

mylist = [
    '????Zynex 0,6',
    '????PayPal 11',
    '????PetIQ 0,5',
    '????First Solar 0,7',
    '????Upwork 1%',
    '????NV5 Global 0,8',
    '????TPI Composites 1',
    '????Fiserv 0,5',
]

res = {}
for elem in mylist:
    key, val = re.sub(r"[^A-Za-z0-9, ]", "", elem).rsplit(" ", 1)
    res[key] = float(val.replace(",", "."))
 
print(res)

Output:

{'Zynex': 0.6, 'PayPal': 11.0, 'PetIQ': 0.5, 'First Solar': 0.7, 'Upwork': 1.0, 'NV5 Global': 0.8, 'TPI Composites': 1.0, 'Fiserv': 0.5}

Edit: Base on your comments, you also want to get a textual representation of the flag emojis. A crude solution is something like this:

def flag_to_str(emoji):
    return "".join(chr(c - 101) for c in emoji.encode()[3::4])


print(flag_to_str("????"))  # US
print(flag_to_str("????"))  # FI

# How it works:
print("????".encode())  # b'xf0x9fx87xbaxf0x9fx87xb8'
print("????".encode()[3::4])  # b'xbaxb8'
print("????".encode()[3::4][0])  # 186
print(chr("????".encode()[3::4][0] - 101))  # U

Explanation: Most of the flag emojis are encoded as a sequence of two regional indicator symbols. E.g. ???? is ??+ ??, and in hexadecimal that is represented as f0 9f 87 ba f0 9f 87 b8 (https://onlineutf8tools.com/convert-utf8-to-hexadecimal?input=????&prefix=false&padding=false&spacing=true). From there we can see that each regional symbol starts with f0 9f 87, and the fourth byte is the amount 101?? added to the equivalent ASCII uppercase character: https://www.asciitable.com. Thus 0xba <=> 186?? - 101?? = U.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...