How could the content be improved?
The following section introduce how data can be processed using loops
Automating data processing using For Loops
I believe it would also be advantageous to have a similar section in the following
Reading CSV Data Using Pandas
Here we can briefly introduce python generators as well. For example, consider a CSV file where entries are name, age, location. We can parse this data to a dataframe using a generator. Image location is a comma separated string field and we want to read latitude and longitude separately.
| name |
age |
location |
| John |
50 |
123341,123321 |
| Emily |
25 |
321321,123321 |
| Wick |
35 |
123341,654789 |
| Raj |
40 |
987789,123321 |
import csv
import pandas as pd
def transform_lines(csv_path):
reader = csv.reader(open(csv_path))
for line_no, line in enumerate(reader):
if line_no == 0:
yield ["Name", "Age", "Latitude", "Longitude"]
else:
name, age, location = line
lat, lng = location.split(",")
yield [name, int(age), float(lat), float(lng)]
lines = transform_lines("./data.csv")
df = pd.DataFrame(lines)
print(df.head())
This is specially useful in large datasets where loading large amount of data in text form is memory consuming.