Hi @sebastianraabe / @danielschwarz
I’m stuck on a behavior with POI bulk import and would like to know if that’s intentional. Maybe one of you has already solved this, or can tell me what the intended approach would be. I ran a few tests against the API in the meantime, so the post is a bit longer than the original question.
What this is about
I’m importing all banks from OpenStreetMap (i.e. amenity=bank) for a control center, that’s roughly 170 POIs for a city plus surrounding area. With the normal POST /API/v1/POI/{dpcId}, about a hundred of them fail with this message:
A point of interest with this name already exists.
Excerpt from the log:
Commerzbank (node/313191615)
Commerzbank (node/409097932)
Commerzbank (node/431258901)... (17 Commerzbank nodes in total)
Stadtsparkasse Düsseldorf (node/241783391)
Stadtsparkasse Düsseldorf (node/253068231)... (19 nodes in total)
Sparkasse HRV, Deutsche Bank, Postbank, Targobank, Santander, Volksbank, VR Bank, Sparda Bank, Kreissparkasse Düsseldorf, all multiple times.
These aren’t OSM duplicates, but genuinely different branches with their own address, their own coordinates, their own OSM node. A Commerzbank exists multiple times in every larger city, with branch networks that’s just how it is.
What particularly surprises me about this: it happens even when the control center was completely empty before the import. The first Commerzbank is created cleanly, but from the second one in the same batch it fails, because SD now knows the name. And that’s from the POI that the importer itself pushed in a second earlier.
My first thought was: that’s what the Shelf Endpoint solves
You have exactly the field with Origin: { namespace, id } (for OSM data that’s for example { "OpenStreetMap", "node/313191615" }) that makes such a POI unique. And the Shelf Endpoint even knows an explicit identificationStrategy=OriginIdOnly. The whole Shelf model is built on the idea that Origin provides uniqueness. So I thought I’d test whether the Shelf path treats the constraint differently than the classic single endpoint.
What I tested
I ran a small test script against POST /API/v1/Shelf/{dpcId}/POI. In the payload each time two POIs: same name __shelfprobe_…, different Origin IDs (probe/…/a and probe/…/b), namespace ShelfProbe. Three variations, all in the same empty test control center.
First, DryRun with identificationStrategy=OriginIdOnly.
Status Ended, both POIs with hasProblems: false, amountFailed: 0. DryRun says: everything is fine.
Second, CreateOnly with actual write operation, same identification strategy.
Status EndedWithProblems, amountCreated: 1, amountFailed: 1. POI A goes through, POI B gets:
"problems": { "PoiNameAlreadyExists": "A point of interest with this name already exists."}
So the same error as at the single endpoint, just wrapped one level deeper in the batch result.
Third, without createMode (default behavior). After test two, POI A was still in the DB. Status EndedWithProblems, and POI A now reports even two problems simultaneously:
"problems": { "PoiNameAlreadyExists": "A point of interest with this name already exists.", "ShelfEntityOriginIdAlreadyExists": "ShelfEntityOriginIdAlreadyExists"}
POI B doesn’t appear in the result at all. Looks like failureMode defaults to StopOnFirstFailure and the batch aborts after the first error.
What I take from this
I would have expected identificationStrategy=OriginIdOnly to signal to the server: if the origin is new, it’s a new dataset, name doesn’t matter. In reality, SD enforces name uniqueness through the Shelf path as well. Practically that means: the origin-based identification mechanism that you yourselves provide in the API model is overridden by the name constraint.
For stations or individual hospitals I completely understand the unique name. That really should only be one per control center. But for banks, pharmacies, supermarkets, gas stations, multiple occurrences are the rule, and the data from OSM (or from any normal CSV import) reflects that.
One small side finding: DryRun doesn’t detect the collision. Test one reports hasProblems: false for exactly the scenario that fails in test two. For a preview function (i.e. showing the user what would happen in a real import) that’s difficult, because you can only verify the conflict through actual writing.
My questions
- Is the control center-wide name uniqueness on POIs intentional? If so, what’s the intended workflow for branch networks? Should one write the district or address into the name, even though Origin already uniquely identifies the POI?
- If intentional, is there an established convention for this? How do other users handle OSM or CSV bulk imports? “Commerzbank, Königsallee 14” works technically, but makes the list long and unwieldy. Every suffix workaround feels to me a bit like circumventing the constraint.
- If not intentional, is there already a ticket for this, or should I open one? From my perspective, the consistent approach would be to enforce uniqueness exclusively through Origin (that’s what the field is for) and limit the name constraint to categories where it makes sense from a business perspective.
- Regarding DryRun: Is it intentional that DryRun doesn’t check the name constraint? If so, is there another way to cleanly predict conflicts before actual write access?
But maybe I’m fundamentally misunderstanding something here.
Thanks a lot!
Best regards
Pat