Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
410 views
in Technique[技术] by (71.8m points)

python - Extract scientific number from string

I am trying to extract scientific numbers from lines in a text file. Something like

Example:

str = 'Name of value 1.111E-11   Next Name 444.4'

Result:

[1.111E-11, 444.4]

I've tried solutions in other posts but it looks like that only works for integers (maybe)

>>> [int(s) for s in str.split() if s.isdigit()]
[]

float() would work but I get errors each time a string is used.

>>> float(str.split()[3])
1.111E-11
>>> float(str.split()[2])
ValueError: could not convert string to float: value

Thanks in advance for your help!!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

This can be done with regular expressions:

import re
s = 'Name of value 1.111E-11   Next Name 444.4'
match_number = re.compile('-? *[0-9]+.?[0-9]*(?:[Ee] *-? *[0-9]+)?')
final_list = [float(x) for x in re.findall(match_number, s)]
print final_list

output:

[1.111e-11, 444.4]

Note that the pattern I wrote above depends on at least one digit existing to the left of the decimal point.

EDIT:

Here's a tutorial and reference I found helpful for learning how to write regex patterns.

Since you asked for an explanation of the regex pattern:

'-? *[0-9]+.?[0-9]*(?:[Ee] *-? *[0-9]+)?'

One piece at a time:

-?        optionally matches a negative sign (zero or one negative signs)
 *       matches any number of spaces (to allow for formatting variations like - 2.3 or -2.3)
[0-9]+    matches one or more digits
.?       optionally matches a period (zero or one periods)
[0-9]*    matches any number of digits, including zero
(?: ... ) groups an expression, but without forming a "capturing group" (look it up)
[Ee]      matches either "e" or "E"
 *       matches any number of spaces (to allow for formats like 2.3E5 or 2.3E 5)
-?        optionally matches a negative sign
 *       matches any number of spaces
[0-9]+    matches one or more digits
?         makes the entire non-capturing group optional (to allow for the presence or absence of the exponent - 3000 or 3E3

note: d is a shortcut for [0-9], but I'm jut used to using [0-9].


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...