Feedback (and question) about REST API for historical data use case


#1

Hello all at WeatherFlow!

Today, I want to share with you some feedback (and questions / suggestions) about the REST API. This feedback comes directly from the following use case:

I’ve developed an application (as a matter of fact it’s a plugin for a more “wide” application, but that does not change the substance of the subject) that has the following goals:

  • to collect “current” data, on a regular basis, from a WF station
  • to store it locally to compile historical data
  • to display (locally stored) current or historical data (in widgets, graphs, charts, etc.)

As you may induce, the historical data is locally compiled and stored only when the application runs. If the application fails, or is stopped, the historical data will have “holes”. And of course, there’s no historical data prior to the application installation…

Now, I want to allows users of my application to “import” old data from WF: data that pre-dates the installation of this application.

And this is where it gets complicated :yum:

To collect data, I use the /observations/station/ endpoint which is fully adapted to what is wanted: to get a “best” view of all probes of the station in one consistent set. Great!

But, to import old data, /observations/station/ is not suitable. It provides only current data. So I must use /observations/device/ endpoint which, as its name suggests, is not based on a “aggregated” view of the station, but on a per device (probe) data set. And it’s a true paradigm shift. With all that implies in terms of new processing workflow.

So my first remark / question / suggestion (remove the useless word) is:

Why not to have an endpoint, similar to /observations/station/, to retrieve old data with the same “format” as /observations/station/? Do you have it in your backlog? Is it something realistic / feasible?

If not, I will use /observations/device/. And here is my second question:

How to simulate with it the view provided by the endpoint /observations/station/? What heuristics should I use?

I know that all this may seem odd, but I also know from experience that the value of an API appears when it is used in concrete cases. And this is my case :blush:

Maybe I missed something obvious. Maybe my explanation is betrayed by my English skills. In any case, do not hesitate to tell me …


#2

I understand what you are asking.

The data you receive from /observations/station/ is the data from /observations/device/ for the two main devices, except is also contains data that is calculated from historical records. You can create the same from the stored data.

Remember, WeatherFlow has the same data you do from the Hub. It is very easy to retrieve and store the data from WeatherFlow. Once the data is local, you make pick a minute of data from the Sky and Air and then use that data to create the same format that is produced by /observations/station/.

You should back up a few steps and write your own routines to replace the /observations/station/ API request. Once you have the data archival data stored locally, you don’t need to use the REST API.

I do all the same from the UDP data. I have no need for the REST API.


#3

Hello @GaryFunk!

OK, so now, I’m sure I was definitely unclear, sorry…

My main goal is to allow to import in my software old data stored by WF.
For now, /observations/station/ doesn’t provides time_start and time_end parameters which allow to control the timeframe of the query. To import old data, my only way to do is to go with /observations/device/ which allow time_start and time_end parameters but doesn’t provide an “aggregated view” of the station.
If I want this “aggregated view” of the station it’s because I don’t know how to manage multiple probes. Computed indexes are not my point here. My point is “how can I build by myself this station view when station have many probes”? What is the heuristic used by WF to publish an outdoor temperature (in /observations/station/) when there are, says, 3 air modules? (that’s an example)

It’s why I proposed, too, to have a consistency between the 2 endpoints, by adding time_start and time_end parameters to the /observations/station/ endpoint.

/observations/station/ will be to query a “station view”, for “now” or for a specified timeframe.
/observations/device/ will be to query a specific device, for “now” or for a specified timeframe.

I think that would be a good idea :slight_smile:


#4

Correct. You have to create your own “aggregated view.”

That’s easy. Every station has a primary Air and a primary Sky. I will guess the primary is the first added of each. WeatherFlow uses only the primary in the “aggregated view.”

You simply do the same as WeatherFlow. Once the data is stored you use the data to create the same information.

What you are asking for is not stored on the WeatherFlow servers. It is a single data set created on demand. To create a days worth on data would probably over tax the servers if several requests were made at the same time.


#5

Ha. Thanks for all @GaryFunk! That was the information I haven’t.
Just one last question - as you seems to better know it than me :innocent: - : how can I know which is the primary probe to take into account (there’s no installation date in API)?


#6

Generally it will be the one with the lowest device_id. However, since you are writing the code you can decide the rules. You can ask the user which device to use, pick the latest firmware or the one with the longest uptime.


#7

Peter, congratulations: you are the first developer (to my knowledge) to ask for this “station/observation timeseries” end point, which is something that we perceived as important, and has been on our featuer backlog, since Day 1.

As you clearly understand, the concept of the “station observation” is to aggregate or federate the data form multiple sensors at a single station into a uniform set of data. We also add “derived parameters” to this data, since some of those derived values require data from more than a single sensor device.

So, the answer is yes! We will have this feature one day. It has remained on our backlog at a relative low priority because, until this post, no one had asked for it… But now someone has, and we will consider that when we next look at moving tasks from the backlog to the “doing” list.

As Gary pointed out, in order to simulate this data construct, you need these formulas for the derived metrics and you pull values from the AIR or SKY as necessary. If there are more than one AIR or SKY, you have to choose one to use. We have a “primary” sensor concept on our back-end that we have not surfaced to the user (or the developer), yet, simply because not many people have more than one of each device. That is also on our backlog!

Meanwhile, your application would need to have a way to choose (or let the user choose) which device to use. Gary’s suggestion to use the lowest device_id will work for the vast majority of users. That’s how we do it currently: if a station has more than one AIR or SKY, the sensors from the lowest device_id become the primaries, by default. And only a handful of people have asked us to switch that.


Websocket Request
#8

Wow! Thanks @dsj!

So happy to know that you have already thought about it. I feel less alone :relieved:
While waiting for the priority to climb up in your backlog, I will use the method you propose. But I’m impatient to have a full consistent API on this subject :slight_smile:

If I have some other suggestions to make on historical data via REST API - and I have :wink: - do you prefer one per post or all the suggestion in one post?