Skip to content

fixed duplicate geocode#407

Open
KanishJebaMathewM wants to merge 4 commits intovalhalla:masterfrom
KanishJebaMathewM:fix/deduplicate-nominatim-results
Open

fixed duplicate geocode#407
KanishJebaMathewM wants to merge 4 commits intovalhalla:masterfrom
KanishJebaMathewM:fix/deduplicate-nominatim-results

Conversation

@KanishJebaMathewM
Copy link
Copy Markdown
Contributor

Fixes #406
image

[
    {
        "place_id": 88664949,
        "licence": "Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright",
        "osm_type": "relation",
        "osm_id": 71525,
        "lat": "48.8534951",
        "lon": "2.3483915",
        "class": "boundary",
        "type": "administrative",
        "place_rank": 12,
        "importance": 0.897098092136026,
        "addresstype": "city",
        "name": "Paris",
        "display_name": "Paris, Ile-de-France, Metropolitan France, France",
        "boundingbox": [
            "48.8155755",
            "48.9021560",
            "2.2241220",
            "2.4697602"
        ]
    },
    {
        "place_id": 88715228,
        "licence": "Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright",
        "osm_type": "relation",
        "osm_id": 7444,
        "lat": "48.8588897",
        "lon": "2.3200410",
        "class": "boundary",
        "type": "administrative",
        "place_rank": 15,
        "importance": 0.897098092136026,
        "addresstype": "suburb",
        "name": "Paris",
        "display_name": "Paris, Ile-de-France, Metropolitan France, France",
        "boundingbox": [
            "48.8155755",
            "48.9021560",
            "2.2241220",
            "2.4697602"
        ]
    },
    {
        "place_id": 310325807,
        "licence": "Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright",
        "osm_type": "relation",
        "osm_id": 115357,
        "lat": "33.6617962",
        "lon": "-95.5555130",
        "class": "boundary",
        "type": "administrative",
        "place_rank": 16,
        "importance": 0.5298648354283636,
        "addresstype": "town",
        "name": "Paris",
        "display_name": "Paris, Lamar County, Texas, 75460, United States",
        "boundingbox": [
            "33.6206345",
            "33.7383866",
            "-95.6279396",
            "-95.4354115"
        ]
    },
    {
        "place_id": 88625937,
        "licence": "Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright",
        "osm_type": "relation",
        "osm_id": 1641193,
        "lat": "48.8588897",
        "lon": "2.3200410",
        "class": "boundary",
        "type": "administrative",
        "place_rank": 14,
        "importance": 0.4594627538257798,
        "addresstype": "city_district",
        "name": "Paris",
        "display_name": "Paris, Ile-de-France, Metropolitan France, France",
        "boundingbox": [
            "48.8155755",
            "48.9021560",
            "2.2241220",
            "2.4697602"
        ]
    }
]

fixed and tested the duplicate entries in API response when selecting a waypoint

@github-actions
Copy link
Copy Markdown

Preview is ready! 🚀 You can view it here: https://valhalla-app-tests.gis-ops.com/407

@nilsnolde
Copy link
Copy Markdown
Member

yeah, you're right, of course we can filter dups as well. it's generally a good idea for sure.

especially your paris example is so frustrating! looking at the osm relations, they seem to be exactly the same (geometry). what's the difference then.. I didn't go through the tons of attributes to understand, but for me it looks a bit data-broken.

did you already ask upstream in a nominatim discussion? I'd be interested to hear a pro's opinion why that's not somehow handled during/post data ingestion on nominatim side.

@nilsnolde
Copy link
Copy Markdown
Member

asked here: osm-search/Nominatim#4058.

Copy link
Copy Markdown
Member

@nilsnolde nilsnolde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some small things, overall looks good!

Comment on lines +53 to +56
const normalizedTitle = result.display_name
.toLowerCase()
.normalize('NFD')
.replace(/[\u0300-\u036f]/g, '');
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you leave some explanation what this represents?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually normalizes the dedup comparison, toLowercase makes it case insensitive , normalize('NFD') for characters like é and regex operation removes those . Eg: Île-de-France and Ile-de-France are same

VITE_NOMINATIM_URL=https://nominatim.openstreetmap.org
VITE_TILE_SERVER_URL="https://tile.openstreetmap.org/{z}/{x}/{y}.png"
VITE_CENTER_COORDS="52.51831,13.393707"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated whitespace change, revert pls

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Comment on lines +57 to +59
const roundedLat = parseFloat(result.lat).toFixed(4);
const roundedLon = parseFloat(result.lon).toFixed(4);
const dedupeKey = `${normalizedTitle}${roundedLat}${roundedLon}`;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes sense to me deduping

Comment on lines +61 to +64
if (seenKeys.has(dedupeKey)) {
continue;
}
seenKeys.add(dedupeKey);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but there has to be some sort of set in JS (i.e. unique element container) which can do this in one line?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah but seenKeys also does the same function :)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha ok then follow up question: why you check for "has"?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the key is already in the set, we skip it - if not, we add it. that's it

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's the whole point of a set: it only allows unique elements, so a seenKeys.add(dedupeKey) is all it should need for such a structure.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops you are right, we dont want .has() here , we can use .add() to keep it simple

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

 processedResults.push({
        title:
          result.display_name.length > 0
            ? result.display_name
            : lngLat?.toString() || '',
        description: `https://www.openstreetmap.org/${result.osm_type}/${result.osm_id}`,
        selected: false,
        addresslnglat: [parseFloat(result.lon), parseFloat(result.lat)],
        sourcelnglat:
          lngLat === undefined
            ? [parseFloat(result.lon), parseFloat(result.lat)]
            : lngLat,
        displaylnglat:
          lngLat !== undefined
            ? lngLat
            : [parseFloat(result.lon), parseFloat(result.lat)],
        key: index,
        addressindex: index,
      });

but wait, the job of .has() is to get out of the loop if it is already present and never execute the processedResults.push() , hope that makes sense (lines 66 - 84)

@nilsnolde
Copy link
Copy Markdown
Member

nilsnolde commented Mar 29, 2026

uh, also, we still get 2 duplicates for "Paris":

image

so smth is not working here as it should. note, on master we get even 3 duplicates with the same name & geometry.

@KanishJebaMathewM
Copy link
Copy Markdown
Contributor Author

uh, also, we still get 2 duplicates for "Paris":
so smth is not working here as it should. note, on master we get even 3 duplicates with the same name & geometry.

will investigate and fix soon

@KanishJebaMathewM
Copy link
Copy Markdown
Contributor Author

image

client-side dedup is now working correctly, however, nominatim still returns 4 entries from the API :

  • entries with same bounding box but different lat/lon and place_id
  • entries with same lat/lon but different place_id and place_rank.
[
    {
        "place_id": 88703675,
        "licence": "Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright",
        "osm_type": "relation",
        "osm_id": 71525,
        "lat": "48.8534951",
        "lon": "2.3483915",
        "class": "boundary",
        "type": "administrative",
        "place_rank": 12,
        "importance": 0.897098092136026,
        "addresstype": "city",
        "name": "Paris",
        "display_name": "Paris, Ile-de-France, Metropolitan France, France",
        "boundingbox": [
            "48.8155755",
            "48.9021560",
            "2.2241220",
            "2.4697602"
        ]
    },
    {
        "place_id": 90003570,
        "licence": "Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright",
        "osm_type": "relation",
        "osm_id": 7444,
        "lat": "48.8588897",
        "lon": "2.3200410",
        "class": "boundary",
        "type": "administrative",
        "place_rank": 15,
        "importance": 0.897098092136026,
        "addresstype": "suburb",
        "name": "Paris",
        "display_name": "Paris, Ile-de-France, Metropolitan France, France",
        "boundingbox": [
            "48.8155755",
            "48.9021560",
            "2.2241220",
            "2.4697602"
        ]
    },
    {
        "place_id": 308463887,
        "licence": "Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright",
        "osm_type": "relation",
        "osm_id": 115357,
        "lat": "33.6617962",
        "lon": "-95.5555130",
        "class": "boundary",
        "type": "administrative",
        "place_rank": 16,
        "importance": 0.5298648354283636,
        "addresstype": "town",
        "name": "Paris",
        "display_name": "Paris, Lamar County, Texas, 75460, United States",
        "boundingbox": [
            "33.6206345",
            "33.7383866",
            "-95.6279396",
            "-95.4354115"
        ]
    },
    {
        "place_id": 89055753,
        "licence": "Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright",
        "osm_type": "relation",
        "osm_id": 1641193,
        "lat": "48.8588897",
        "lon": "2.3200410",
        "class": "boundary",
        "type": "administrative",
        "place_rank": 14,
        "importance": 0.4594627538257798,
        "addresstype": "city_district",
        "name": "Paris",
        "display_name": "Paris, Ile-de-France, Metropolitan France, France",
        "boundingbox": [
            "48.8155755",
            "48.9021560",
            "2.2241220",
            "2.4697602"
        ]
    }
]

Our fix deduplicates using normalizedTitle + boundingbox as the key, falling back to normalizedTitle + lat/lon when boundingbox is missing and the root cause seems to be in nominatim's side osm-search/Nominatim#4058

Comment on lines 66 to 84
processedResults.push({
title:
result.display_name.length > 0
? result.display_name
: lngLat?.toString() || '',
description: `https://www.openstreetmap.org/${result.osm_type}/${result.osm_id}`,
selected: false,
addresslnglat: [parseFloat(result.lon), parseFloat(result.lat)],
sourcelnglat:
lngLat === undefined
? [parseFloat(result.lon), parseFloat(result.lat)]
: lngLat,
displaylnglat:
lngLat !== undefined
? lngLat
: [parseFloat(result.lon), parseFloat(result.lat)],
key: index,
addressindex: index,
});
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is gets executed if we use .add() in the the set instead of .has() followed by continue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Duplicate entries in API response when selecting waypoint

2 participants