-
Notifications
You must be signed in to change notification settings - Fork 4
Smooth layer #223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Smooth layer #223
Changes from all commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
5f7a271
initial kernel smooth
teunbrand a2dcd69
add OLS method
teunbrand 23ac41f
add TLS method
teunbrand 1823367
test: add grouped and ungrouped tests for OLS and TLS regression methods
teunbrand 6cfa812
address review comments
teunbrand 917fde8
add grid trimming for smooth and violin layers to prevent extrapolation
teunbrand fd66352
remove vestigial density filtering from violin writer
teunbrand 0834127
add docs
teunbrand 92c75e3
Merge branch 'main' into smooth
teunbrand 3923fa8
resolve doc merge issues
teunbrand f949ca6
cargo fmt
teunbrand f85de83
Change boolean `trim` to numeric `tails`
teunbrand 4e38e60
add test
teunbrand dee722e
Merge branch 'main' into smooth
teunbrand 3cb330d
cargo fmt
teunbrand 59f76de
amend based on review
teunbrand File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,153 @@ | ||
| --- | ||
| title: "Smooth" | ||
| --- | ||
|
|
||
| > Layers are declared with the [`DRAW` clause](../clause/draw.qmd). Read the documentation for this clause for a thorough description of how to use it. | ||
|
|
||
| Smooth layers are used to display a trendline among a series of observations. | ||
|
|
||
| ## Aesthetics | ||
|
|
||
| ### Required | ||
| * Primary axis (e.g. `x`): Position along the primary axis. | ||
| * Secondary axis (e.g. `y`): Position along the secondary axis. | ||
|
|
||
| ### Optional | ||
| * `colour`/`stroke`: The colour of the line | ||
| * `opacity`: The opacity of the line | ||
| * `linewidth`: The width of the line | ||
| * `linetype`: The type of line, i.e. the dashing pattern | ||
|
|
||
| ## Settings | ||
|
|
||
| * `method`: Choice of the method for generating the trendline. One of the following: | ||
| * `'nw'` or `'nadaraya-watson'` estimates the trendline using the Nadaraya-Watson kernel regression method (default). | ||
| * `'ols'` estimates a straight trendline using ordinary least squares method. | ||
| * `'tls'` estimates a straight trendline using total least squares method. | ||
|
|
||
| The settings below only apply when `method => 'nw'` and are ignored when using other methods. | ||
| * `bandwidth`: A numerical value setting the smoothing bandwidth to use. If absent (default), the bandwidth will be computed using Silverman's rule of thumb. | ||
| * `adjust`: A numerical value as multiplier for the `bandwidth` setting, with 1 as default. | ||
| * `kernel`: Determines the smoothing kernel shape. Can be one of the following: | ||
| * `'gaussian'` (default) | ||
| * `'epanechnikov'` | ||
| * `'triangular'` | ||
| * `'rectangular'` or `'uniform'` | ||
| * `'biweight'` or `'quartic'` | ||
| * `'cosine'` | ||
|
|
||
| ## Data transformation | ||
|
|
||
| ### Nadaraya-Watson kernel regression | ||
|
|
||
| The default `method => 'nw'` computes a locally weighted average of $y$. | ||
|
|
||
| $$ | ||
| y(x) = \frac{\sum_{i=1}^nW(x)y_i}{\sum_{i=1}^nW(x)} | ||
| $$ | ||
|
|
||
| Where: | ||
|
|
||
| * $W(x)$ is kernel intensity $w_iK(\frac{x - x_i}{h})$ where | ||
| * $K$ is the kernel function | ||
| * $h$ is the bandwidth | ||
| * $w_i$ is the weight of observation $i$ | ||
|
|
||
| Please note the similarity of $W(x)$ to the [kernel density estimation formula](density.qmd#data-transformation). | ||
|
|
||
| ### Ordinary least squares | ||
|
|
||
| The `method => 'ols'` setting uses ordinary least squares to compute the intercept $a$ and slope $b$ of a straight line. | ||
| The method minimizes the 1-dimensional distance between a point and the vertical projection of that point on the line. | ||
| Only considering the vertical distances implies having measurement error in $y$, but not $x$. | ||
|
|
||
| $$ | ||
| y = a + bx | ||
| $$ | ||
|
|
||
| Wherein: | ||
|
|
||
| $$ | ||
| a = E[Y] - bE[X] | ||
| $$ | ||
|
|
||
| and | ||
|
|
||
| $$ | ||
| b = \frac{\text{cov}(X, Y)}{\text{var}(X)} = \frac{E[XY] - E[X]E[Y]}{E[X^2]-(E[X])^2} | ||
| $$ | ||
|
|
||
| ### Total least squares | ||
|
|
||
| The `method => 'tls'` setting uses total least squares to compute the intercept $a$ and slope $b$ of a straight line. | ||
| The method minimizes the 2-dimensiontal distance between a point and the perpendicular projection of that point on the line. | ||
| Minimising the perpendicular distances (rather than just the vertical distances) makes sense if there is uncertainty or measurement error in not just $y$, but in $x$ as well. | ||
| In such case, it is a more accurate depiction of the relationship between $x$ and $y$, but it isn't the best predictor of $y$ given $x$. | ||
|
|
||
| $$ | ||
| y = a + bx | ||
| $$ | ||
|
|
||
| Wherein: | ||
|
|
||
| $$ | ||
| a = E[Y] - bE[X] | ||
| $$ | ||
|
|
||
| and | ||
|
|
||
| $$ | ||
| b = \frac{\text{var}(Y) - \text{var}(X) + \sqrt{(\text{var}(Y) - \text{var}(X))^2 + 4\text{cov}(X, Y)^2}}{2\text{cov}(X, Y)} | ||
| $$ | ||
|
|
||
| ### Properties | ||
|
|
||
| * `weight` is available when using `method => 'nw'`, where when mapped, it sets the relative contribution of an observation $w_i$ to the average. | ||
|
|
||
| ### Calculated statistics | ||
|
|
||
| * `intensity` corresponds to $y$ in the formulas described in the [data transformation](#data-transformation) section. | ||
|
|
||
| ### Default remappings | ||
|
|
||
| * `intensity AS y`: By default the smooth layer will display the $y$ in the formulas along the y-axis. | ||
|
|
||
| ## Examples | ||
|
|
||
| The default `method => 'nw'` might be too coarse for timeseries. | ||
|
|
||
| <!-- Ideally, we would just use the date here directly but we currently require numeric data --> | ||
|
|
||
| ```{ggsql} | ||
| SELECT *, EPOCH(Date) AS numdate FROM ggsql:airquality | ||
| VISUALISE numdate AS x, Temp AS y | ||
| DRAW point | ||
| DRAW smooth | ||
| ``` | ||
|
|
||
| You can make the fit more granular by reducing the bandwidth, for example using `adjust`. | ||
|
|
||
| ```{ggsql} | ||
| SELECT *, EPOCH(Date) AS numdate FROM ggsql:airquality | ||
| VISUALISE numdate AS x, Temp AS y | ||
| DRAW point | ||
| DRAW smooth SETTING adjust => 0.2 | ||
| ``` | ||
|
|
||
| There is a subtle difference between the ordinary and total least squares method. | ||
|
|
||
| ```{ggsql} | ||
| VISUALISE bill_len AS x, bill_dep AS y FROM ggsql:penguins | ||
| DRAW point | ||
| DRAW smooth MAPPING 'Ordinary' AS colour SETTING method => 'ols' | ||
| DRAW smooth MAPPING 'Total' AS colour SETTING method => 'tls' | ||
| ``` | ||
|
|
||
| Simpson's Paradox is a case where a trend of combined groups is reversed when groups are considered separately. | ||
|
|
||
| ```{ggsql} | ||
| VISUALISE bill_len AS x, bill_dep AS y, species AS stroke FROM ggsql:penguins | ||
| DRAW point SETTING opacity => 0 | ||
| DRAW smooth SETTING method => 'ols' | ||
| DRAW smooth MAPPING 'All' AS stroke SETTING method => 'ols' | ||
| ``` | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.