Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
116 views
in Technique[技术] by (71.8m points)

python - numpy delete is converting float values into string

I'm writing a decision tree algorithm in python for class for both continuous and categorical values and I'm having problems updating the database after choosing the best attribute.

I wrote functions called delete_rows and delete_attribute to cancel part of the examples and a column respectively each iteration. (the algorithm is based on the pseudo-code found on the Russel-Norving textbook, which should be the ID3 version)

new_examples_list = delete_rows(examples_list, best_attr, v)
new_examples_list = delete_attribute(new_examples_list, best_attr)

I don't know numpy really well but after searching online I wrote it like this:

def delete_attribute(examples_list, attribute):
    examples_list = numpy.delete(examples_list, attribute, axis=1)
    return examples_list

The problem is that when I call it, all the data in examples_list (the matrix that has all the data of the database) is converted in string, even for the attributes that were originally float. Since I have to use different functions for categorical or numeric values and I check the type with a is_instance function, this causes problems in the following steps of the tree.

Can I solve this just by adjusting the delete_attribute function or it's probably a bigger problem? I hope I explained myself throughly, I'm still new to python and this is my first time asking a question.

EDIT: I've added an example:

Say that my original data is like this (read from a csv)

titles = [A, B, C, D, Goal]

data = [[20,15,21,17,'No']
        [40,16,33,8,'Yes']
        [44,40,38,18,'No']
        [18,16,21,2,'Yes']
        [7,12,8,40,'Yes']]

the algorithm finds A to be the best attribute and a threshold to divide the data at 19. Say that we want to see the split data for the values of A > 19 The method delete_rows simply keeps the examples that fit this criteria and I get

data = [[20.0, 15.0, 21.0, 17.0, 'No'] 
       [40.0, 16.0, 33.0, 8.0, 'Yes']
       [44.0, 40.0, 38.0, 18.0, 'No']]

When I try to use delete_attribute as shown before to delete the column of A I get this:

data = [['15.0' '21.0' '17.0' 'No']
       ['16.0' '33.0' '8.0' 'Yes']
       ['40.0' '38.0' '18.0' 'No']]

I assume that since the original data has both numerical and string values it then converts anything to string? I'd like to just keep the last column of the result as string. Thank you.

In this example all the data is numerical but of course I'd have to consider also other databases with mixed values

question from:https://stackoverflow.com/questions/66050617/numpy-delete-is-converting-float-values-into-string

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Looks like you need to explicitly define the array data type as object

import numpy as np
titles = ['A', 'B', 'C', 'D', 'Goal']

data = np.array([[20,15,21,17,'No'],
        [40,16,33,8,'Yes'],
        [44,40,38,18,'No'],
        [18,16,21,2,'Yes'],
        [7,12,8,40,'Yes']], dtype='object')

def delete_attribute(examples_list, attribute):
    examples_list = numpy.delete(examples_list, attribute, axis=1)
    return examples_list

np.delete(data,titles.index('A'),1)

Output

array([[15, 21, 17, 'No'],
       [16, 33, 8, 'Yes'],
       [40, 38, 18, 'No'],
       [16, 21, 2, 'Yes'],
       [12, 8, 40, 'Yes']], dtype=object)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...