Re: [thredds] aggregating on both time and time_run dimensions?

To: THREDDS Users <thredds@xxxxxxxxxxxxxxxx>
Subject: Re: [thredds] aggregating on both time and time_run dimensions?
From: John Maurer <jmaurer@xxxxxxxxxx>
Date: Mon, 11 Aug 2025 15:43:08 -1000

Nevermind, I think I have solved this. I was overthinking it. There's no
need to make time_run its own coordinate variable (i.e., dimension). (Even
though that's how FMRC does it.) Instead, I just define a time_run variable
that uses the existing time dimension, like any other time series variable.
That way the aggregation works and a user can get a time_run value for
every time step.
Cheers,
John


On Mon, Aug 11, 2025 at 12:39 PM John Maurer <jmaurer@xxxxxxxxxx> wrote:

> Hi TDS folks,
> We have a new use case for aggregating our FMRC collections, but I'm
> having difficulty implementing it. To save space, the files are now daily
> files, rather than multi-day files. This avoids repeating the same day
> across multiple files and drastically cuts down on storage requirements
> over the long term. Rather, data for the same day will overwrite any
> previous file for the same day. What results is essentially a "Best Time
> Series" (now-casts) with only the latest handful of files containing
> forecasts of future days.
>
> Inside the files, we are storing both "time" and "time_run" coordinate
> variables so that an end user will know when the model was run for each
> timestep. Since the runtime is no longer in the filenames (the date in the
> filenames indicates the day of the time steps), I am not employing the
> traditional FMRC aggregations via featureCollection. Thus, I'm trying to
> figure out how to do an NcML aggregation on a file scan that can aggregate
> over both the files' "time" and "time_run" coordinate variables to achieve
> an FMRC-like effect.
>
> I know that nested NcML aggregations are possible, but I don't know how
> they might be used to aggregate over two time variables. Is there a way?
>
> If I do this, then *time_run* (the outer aggregation) only gets the
> penultimate file's values:
>
>         <aggregation dimName="time_run" type="joinExisting">
>           <netcdf>
>             <aggregation dimName="time" type="joinExisting">
>               <scan location="/path/to/model/data/" suffix=".nc"
> subdirs="true" olderThan="5 min" />
>             </aggregation>
>           </netcdf>
>         </aggregation>
>
> And if I do this, then *time* (the outer aggregation) only gets the
> penultimate file's values:
>
>         <aggregation dimName="time" type="joinExisting">
>           <netcdf>
>             <aggregation dimName="time_run" type="joinExisting">
>               <scan location="/path/to/model/data/" suffix=".nc"
> subdirs="true" olderThan="5 min" />
>             </aggregation>
>           </netcdf>
>         </aggregation>
>
> Any ideas or suggestions on how this can be accomplished? As a fallback, I
> might have to use a more brute-force approach and tack on runtimes into the
> filenames (e.g., model_20250812_20250801.nc) where the second date
> indicates the time_run coordinate. But then it's no longer a simple
> overwrite of model_20250812.nc, and I need to remove the prior day's
> runtime (e.g., model_20250812_20250731.nc) when saving a new runtime.
>
> Many thanks!,
> John Maurer
> Data System Engineer
> Pacific Islands Ocean Observing System (PacIOOS)
> University of Hawaii at Manoa
>

References:
- [thredds] aggregating on both time and time_run dimensions?
  - From: John Maurer