Mixed Data Types from Excel Files

Excel files when read in geocluster often have mixed type columns - this currently manifests in 2 ways in the attached dataset :

Age Range (column values are - ) is treated as a numeric type and when get_clusters() attempts to scale the data, it throws a ValueError: Cannot broadcast operands together.
DRID (Mixed type column has strings & really large numbers) which cause an overflow error when df.to_json() is called.

@pratap.vardhan suggested a basic data quality check that reads the n rows of each column and explicitly forces data types (casting object types to string) directly after files have been uploaded. @s.anand thoughts?

Both these errors were raised by the email aastha sent @sundeep.mallu Aug_Oct_Clusters.xlsx

Edited Sep 04, 2018 by Karmanya Aggarwal