User:DarTar/Data model feedback
Jump to navigation
Jump to search
Some feedback on the data and data model as of January 4, 2020.
Observation entities
- Item description
- I feel a description like "Observation of Diodia virginiana" may not be persistent if the ID changes, however a description like "iNaturalist observation by dartar" should be in principle persistent. Is the rationale to add the taxon to make the items searchable?
- Observed on
- Is there a reason why the time is stripped? I see the iNat API gives a full timestamp as a response, e.g.
"time_observed_at": "2019-12-28T21:22:10-05:00"
- Observer
- Can you strip the full URL and just leave the username
- Scientific name
- If you have it in the observation data, it would be useful to store the most recent, observation-level taxon ID in the observation record itself, instead of just resolving to the QID of the taxon. This will allow to query/filter directly the observation records.
Taxon entities
- Taxon hierachy
- I don't know what you can / want to retrieve about taxa but it would be fantastic to have at least the following represented:
- the iNat taxon ID as a statement
- a link to the parent taxon
- a statement with the parent taxon ID
- some basic ID mapping (GBIF / Wikidata for starters?)
Wikibase data model
- Instance of?
- Isn't it useful to have a notion of an Instance of to separate different types of entities?
- Multi-language support
- In my opinion, there's no need to have multiple languages in the Wikibase data model, IMO, since an observation will likely just include structured data. The only exception would be for taxon entities if you're planning to ingest localized common names.
Data
- CC0 or all observations?
- The data is too sparse to do meaningful analyses and queries. The most common taxa have less than 300 data points, I would consider ingesting the entire iNat data dump from GBIF and maybe creating a smaller instance with the exact same data model for fast prototyping / debugging.