Skip to content

Migrate sheets reading to use serde #39

@illicitonion

Description

@illicitonion

We have a bunch of structs which we fill from Google Sheets, one per row, e.g.

trainee-tracker/src/prs.rs

Lines 217 to 224 in aa0d578

#[derive(Debug, PartialEq, Eq, Serialize)]
pub(crate) struct ReviewerStaffOnlyDetails {
pub(crate) name: String,
pub(crate) attended_training: bool,
pub(crate) checked: CheckStatus,
pub(crate) quality: String,
pub(crate) notes: String,
}

We currently do this in an annoyingly manual and inflexible way.

For this particular struct, we populate it here:

for (row_index, cells) in sheet.rows.iter().enumerate() {
if row_index == 0 {
continue;
}
if cells.len() < 6 {
continue;
}
let github_login = GithubLogin::from(cell_string(&cells[0]));
let notes = match cells.get(6) {
Some(cell) => cell_string(cell),
None => String::new(),
};
let checked = match (cell_bool(&cells[3]), cell_bool(&cells[4])) {
(true, false) => CheckStatus::CheckedAndOk,
(true, true) => CheckStatus::CheckedAndCheckAgain,
(false, _) => CheckStatus::Unchecked,
};
reviewers.insert(
github_login.clone(),
ReviewerStaffOnlyDetails {
name: cell_string(&cells[1]),
attended_training: cell_bool(&cells[2]),
checked,
quality: cell_string(&cells[5]),
notes,
},
);
}

We ignore the headings completely, so if someone changed the sheet, we would start reading incorrect values. Then we hard-code the field order and types of each cell.

In other places, we manually check headings, and just error if they're wrong:

for (row_number, cells) in sheet_data.into_iter().enumerate() {
// Some sheets have documentation or pivot table
if row_number == 0 && !cells.is_empty() && cell_string(&cells[0]) != "Name" {
continue;
}
if cells.len() < 7 {
return Err(anyhow::anyhow!(
"Not enough columns for row {} - expected at least 7, got {} containing: {}",
row_number,
cells.len(),
format!("{:#?}", cells),
));
}
if row_number == 0 {
let headings = cells.iter().take(7).map(cell_string).collect::<Vec<_>>();
if headings
!= [
"Name",
"Email",
"Timestamp",
"Course",
"Module",
"Day",
"Location",
]
{
return Err(anyhow::anyhow!(
"Register sheet contained wrong headings: {}",
headings.join(", ")
));
}
} else {
if cells[0].effective_value.is_none() {
break;
}
let (sprint_number, attendance) = read_row(&cells, register_url.clone())
.with_context(|| format!("Failed to read attendance from row {}", row_number))?;
if attendance.timestamp.date_naive() <= start_date
|| attendance.timestamp.date_naive() >= end_date
{
continue;
}
let sprint_index = sprint_number - 1;
while sprints.len() < sprint_number {
sprints.push(IndexMap::new());
}
if sprints[sprint_index].contains_key(&attendance.email) {
warn!(
"Register sheet contained duplicate entry for sprint {} trainee {}",
sprint_number, attendance.email
);
} else {
sprints[sprint_index].insert(attendance.email.clone(), attendance);
}
}
}

We also handle partial rows poorly.

I would love instead for us to just use serde. We could derive Deserialize for each struct we populate, and use a serde adaptor to read the sheet into a Vec<T>.

There is an existing serde_sheets crate for doing this, but it has some issues. If it runs into issues with a row, it just prints an error to stdout and continues (https://docs.rs/serde_sheets/latest/src/serde_sheets/lib.rs.html#200).

Ideally we would either improve this crate, or write our own one. I'm imagining supporting a function along the lines of:

pub async fn read_all<T: DeserializeOwned>(
    sheets: &mut Sheets,
    document_id: &str,
    tab_name: &str,
) -> SheetReadResult<T>;

enum SheetReadResult<T> {
    AllOk(Vec<T>),
    SomeOk {
        ok: Vec<T>,
        partial_records: Vec<Record>,
        deserialization_error_records: Vec<(Record, serde::Error)>,
    },
    Err(SheetsError),
}

struct Record {
    row_number: usize,
    cells: Vec<CellData>,
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions