Mixed Data Types from Excel Files
Excel files when read in geocluster often have mixed type columns - this currently manifests in 2 ways in the attached dataset :
-
Age Range (column values are - ) is treated as a numeric type and when
get_clusters()
attempts to scale the data, it throws aValueError: Cannot broadcast operands together.
-
DRID (Mixed type column has strings & really large numbers) which cause an overflow error when df.to_json() is called.
@pratap.vardhan suggested a basic data quality check that reads the n rows of each column and explicitly forces data types (casting object types to string) directly after files have been uploaded. @s.anand thoughts?
Both these errors were raised by the email aastha sent @sundeep.mallu Aug_Oct_Clusters.xlsx