feature/: Exit Withdraw Date code to handle inclusivity/exclusivity#201
Conversation
|
I meant to say, but forgot, that I don't care if you use exactly what I wrote or rename everything or design it differently or whatever. The bottom line for us is that exit_withdraw_date is exclusive in our world and so we need a version of edu_wh where that is a configurable option. Thanks! |
I struggle with understanding all the implementation reasons and due to a requirement at TDOE, I have changed this back to what it was, but updated the comment so it made more sense.
rlittle08
left a comment
There was a problem hiding this comment.
thank you for the thoroughness, including updating tests.
The design, naming, etc. are all approved. Good catch on the all_student_school_associations_have_calendars bug, too. I think that'd only make a difference in very small edge cases, but it's nice to have the logic consistent.
Two things I want to think through before releasing
- Can/should we go ahead and add this logic in other places within edu_wh? The ones I'm aware of are stu-section and staff-section. This could easily go in a later PR, though, given it's not relevant to TDOE
- I'm actually worried that edu_wh dbt models are the least common place that this inclusivity logic is found. More often, custom models, ad-hoc queries, BI tools, or soon AI query builders are doing this logic. What is the best way for EDU to properly document when a certain end date should be considered inclusive vs. exclusive? Is this an argument for making a larger change to the data model, either adding some kind of metadata column or (more drastically) forcing standardization of end date inclusivity? I imagine the latter would not make you happy at TDOE, given queryers expect based on history that the dates will be exclusive...
|
|
Also, this PR is missing a commit that I made on Feb 19. I undid a change to put back the way it was. I thought it would show up automatically, but it's not for some reason. |
|
@smckee-tnedu do you want to open a fresh pr to get the final change in? I am doing final review and hoping to release this week |
|
Is this the change? |
|
The entire where clause should read like so: i had originally changed this line |
|
@smckee-tnedu got it, thanks! How should the first day of school be treated? With current logic, if |
|
There is a lot going on here and the entire thing is wrapped up in how we process no shows. In our world, we have two types of no shows (our districts don't use the no-show exit withdraw type).
So Districts want to see BOTH of these types of no shows. So, in my testing, this chunk of code allows us to keep our No Show type 2: and this chunk of code allows us to keep our No Show type 1 (merely because your original code assumes exit_withdraw_date inclusivity: Should the above line change because I think your original intent is to EXCLUDE no shows from this fact table? Maybe, but it was working for our purposes, so I left it alone (after an initial attempt to fiddle with it that didn't work). (For additional and unnecessary information, all our No Show types can be defined as "any Enrollment where there exists no School Days within the Enrollment date range". So, you could imagine a 3rd type of Enrollment that doesn't fit in with the 2 defined above where a Student's entire set of enrollment days occurs during the Christmas holiday. We still consider this a no show, even though it's not one of the two main versions.) So, to answer your question, from my perspective, your scenario and the behavior seem to align to me: Assumptions
Reasoning
ConclusionThe enrollment record should be dropped. |
|
@smckee-tnedu much appreciated. We are going forward with your conclusion, that these records should be dropped. We may later revisit a configuration for this particular rule, but we don't need that at the moment. I'm ready to approve and merge. I was able to get your latest commit into the PR by changing the base of the PR off main, then back on main, from your fork repo. Don't ask me why that works lol |
Description & motivation
If you ask Edfi if the enrollment exit_withdraw_date "should considered inclusive (this is the last day of the enrollment and the student attended on that day) or exclusive (the last day the student attended for their enrollment was the day prior)" then they will tell you "treat it however it makes sense for your implementation". Dandy. But EDU assumes that this date is inclusive. Here at TDOE, this is the ONLY date that we consider exclusive. So lots of the models, and what not, are not accurate. In particular, the "is_active_enrollment" in fct_student_school_association. But also, many of the attendance models would also be inaccurate for us. The changes presented incorporate a new variable that can be set,
edu:enroll:exit_withdraw_date_inclusive, and any models that use exit_withdraw_date will now utilize a few new macros to evaluate exit_withdraw_date from an inclusive or exclusive manner, however configured. The default for this variable is that exit_withdraw_date is true, so the output of EDU shouldn't change for people who came before TDOE.Breaking changes introduced by this PR:
There should be no breaking changes. However, two caveats:
tests/integrity_tests/all_student_school_associations_have_calendars.sql. There is a comparison between the first day of school and the enrollment exit_withdraw_date. But everywhere in EDU, it treats exit_withdraw_date as inclusive and uses <= instead of < when comparing dates. Unless I'm missing something, and, admittedly, I might be, this code should have fct_stu_school_assoc.exit_withdraw_date >= first_school_day.first_school_day instead of what is currently there. If the comparator was >= then this would include students who only attended the first day of school and then exited. (I think).snowflake__day_count_in_range. There is a modelfct_student_school_attendance_eventthat would calculate the number of days for an enrollment. In Databricks, for an inclusive end date, this calculated int would be wrong. Take a 2 day enrollment for example, Nov 1 to Nov 2. Supposing Nov 2 is inclusive in this example, the expected value is 2, but the code was producing a 1. I am assuming Snowflake works the same, so I am adding a +1 to the Snowflake calculation in order to account for this. I don't have Snowflake, so I could be wrong and this will need fixing. I am not sure. However, in any event, the value calculated isn't even used in the output of the model, so it may not matter that it might have been off by 1 day. In any event, there may be dragons here.All other changes should be completely transparent since the default behavior of the enhancement favors the current EDU implementation.
PR Merge Priority:
I would prefer to set this to Medium because we want to get off of our own forks of edu_wh and we won't be able to until certain other things are merged into edu_wh main. However, there is another change we have in the queue that deals with adding CDS (and an enhanced impl of CDS at that) to basically every dim and fact. So I left this as Low.
Changes to existing files:
models/core_warehouse/fct_student_daily_attendance.sql: Changed a case statement to use the new macro for determing if a date counts within the enrollment end date.models/core_warehouse/fct_student_school_association.sql: Modified all the comparisons to exit_withdraw_date to use the new macros.models/core_warehouse/fct_student_school_attendance_event.sql: Modified all the comparisons to exit_withdraw_date to use the new macros. This includes a join on dim_calendar_date.models/qc/sections_per_enrollment.sql: Modified all the comparisons to exit_withdraw_date for the overlap logic with section dates.tests/integrity_tests/all_student_school_associations_have_calendars.sql: Modified the comparison to exit_withdraw_date for the students who exited before the first day of school. The comment here is confusing. But also, I think there's a bug in the original code. See above section on stuff that might be broken.tests/value_tests/invalid_enrollments.sql: Modified the comparisons to exit_withdraw_date for the columns of if they exited before their first day or before entry.tests/value_tests/overlapping_enrollments.sql: Modified the join clause to not use between and use the macro instead of the second date normally in the between.New files created:
macros/end_date_utils.sql: Contains the macros necessary to correctly evaluate dates inclusivity/exclusivity.Tests and QC done:
I tested only the databricks side of this since I do not have Snowflake. Things appear to be working as expected.
edu_wh PR Review Checklist:
Make sure the following have been completed before approving this PR:
## New Featuresfor features, etc.