Best Database For AI & IoT       
Jaguar Technology Product Documents Download Source



Data Convergence

Unlock the power of data convergence with JaguarDB, where vector data, time series, and location data seamlessly converge in a single, unified platform. Experience the convenience of managing and harnessing diverse data types in one place, empowering you to gain comprehensive insights and make informed decisions with unparalleled efficiency and precision.

Instant Scalability

JaguarDB is a highly scalable and fast database designed for AI, Internet of Things (IoT), and robotic machines that generate massive amounts of data. Traditional methods for achieving horizontal scalability, such as consistent hashing, involve data migration and become time-consuming as datasets grow. However, JaguarDB's unique ZeroMove Hashing data distribution technology allows for instant scaling with zero data migration. For example, scaling from 1000 to 4000 nodes in a JaguarDB cluster takes only a few seconds, enabling the system to store five times more data without interruption. Scaling in JaguarDB is effortless and requires no data migration, making it an ideal choice for large-scale distributed systems.


AI Data

Artificial intelligence (AI) systems are trained using large amounts of data to learn and improve their performance. This is because AI algorithms use statistical techniques to find patterns and make predictions based on the data they have been trained on. The more diverse and representative the data is, the better the AI will be able to learn and generalize from that data. To create accurate and reliable AI models, it is important to ensure that the data used for training is of high quality, well-structured, and covers a wide range of scenarios and use cases. This allows the AI to learn from a variety of perspectives and make more accurate predictions or decisions when applied to new data. Therefore, having lots of good data is essential for developing robust and accurate AI models that can be applied in a variety of contexts and provide value to businesses and individuals alike. Good data comes from a well-managed database where knowledge and facts are maintained and fed to the AI systems to reach another level of intelligence. The vector database in JaguarDB can store and index the embeddings of image and text data for fast search in a multi-node distributed architecture which can be easily scaled out horizonally more than a million time faster than any other distributed databases. The distributed storage technology also makes storing large volumes of raw data like videos, images easier than ever.


The following command creates a table containing vector column v:

create table vect ( key: a int, value: v vector, b char(4) );

This command creates an index on table vect based on column v:

create index vecidx on vect( v );

Vector data can be stored in the table vect as follows:

insert into vect values (10, vector(1.2, 2.4, 3.2, 4.3, 5.7, 6.3, 100, 103.4), 'w' );
insert into vect values (20, vector(1.8, 8.4, 2.2, 1.3, 9.7, 11.3, 10, 38.4), 's' );
insert into vect values (30, vector(1.6, 4.2, 9.2, 4.3, 1.7, 15.3, 1, 13.4), 'w' );
insert into vect values (40, vector(2.6, 1.2, 7.2, 3.3, 7.3, 5.3, 8, 2.4), 's' );

Similarities of the v column can be computed with the following command:

select a, b, similarity(v, vector(1, 2, 3, 4, 5, 13, 1)) sim from vect order by sim desc;
select a, similarity(v, vector(1, 2, 3, 4, 5, 13, 1)) sim from test.vect.vecidx order by sim desc;

High Velocity Time Series Data

JaguarDB offers a fast and efficient way to ingest time series data from robotic devices, including location-based data. With indexing capabilities in both space and time dimensions, back-filling large volumes of past time series data is also quick and easy. In JaguarDB, a time series has a unique meaning as it is not just a sequence of data points indexed in time order. It also comprises a series of tick tables that store aggregated data values at specific time spans. For example, a time series table in JaguarDB may have a base table that stores data points in time order, along with tick tables that store aggregated data within time spans such as 5 minutes, 15 minutes, hourly, daily, weekly, and monthly tables.

The following formats describe commands to create a time series table:

create table timeseries(TICK:RETENTION, TICK:RETENTION, …|BASERENTION)
BASETABLE (key: KEYCOL1, KEYCOL2, …, value: col rollup VTYPE, …);
     In the statement above:
     TICK:RETENTION specifies a tick type and retention period of the tick table;119
     BASERENTION represents the retention period of the base table;
     BASETABLE is the name of the base table;
     KEYCOL1, KEYCOL2, … are the key columns in the base table;
     Rollup specifies the columns whose values will be rolled up to the tick tables;
     VTYPE is the type of the column to be rolled up.
     The TICK keyword starts with a number and a period type. For example, 15s means 
	 a tick table of 15 seconds; 30m means a tick table of 30 minutes.
     The letter ‘s’ indicates TICKs in seconds. 
     The letter ‘m’ indicates TICKs in minutes. 
     The letter ‘h’ indicates TICKs in hours. 
     The letter ‘d’ indicates TICKs in days. 
     The letter ‘w’ indicates TICKs in weeks. 
     The letter ‘M’ indicates TICKs in months. 
     The letter ‘M’ indicates TICKs in months. 
     The letter ‘q’ indicates TICKs in quarters. 
     The letter ‘y’ indicates TICKs in years. 
     The letter ‘D’ indicates TICKs in decades. 
     Valid TICKs in seconds scale include: 1s, 2s, 3s, 5s, 6s, 10s, 12s, 15s, 20s, 30s. 
     Valid TICKs in minutes include: 1m, 2m, 3m, 5m, 6m, 10m, 12m, 15m, 20m, 30m. 
     Valid TICKs in hours include: 1h, 2h, 3h, 4h, 6h, 8h, 12h. 
     Valid TICKs in days include: 1d, 2d, 3d, 4d, 5d, 6d, 7d, 10d, 15d. 
     Valid TICKs in weeks include: 1w, 2w, 3w, 4w. 
     Valid TICKs in months include: 1M, 2M, 3M, 4M, 6M. 
     Valid TICKs in quarters include: 1q, 2q. 
     Valid TICKs in years can be any number of years. 120
     Valid TICKs in decades can be any number of decades. 
     Multiple TICKs are allowed in the same TICK group. For example, you can have 5m and 15m 
     tables, and 1d and 10d tick tables.
	 

The format for the RETENTION is the same as the TICK format, except that it can have any number of retention periods. The RETENTION specifies how long the data points in the base table should be kept. Examples of RETENTION are 15d, 1M, 3M, 1y, etc. If no RETENTION is provided, the data points in the tick table are not deleted. If the retention period is passed, old data will be deleted from the tick tables. The BASERETNYTION specifies how long the data points in the base table should persist. Data points that are older than the retention period are deleted frequently. If no BASERETENTION is provided, the data points in the base table will not be deleted.
A rollup column in a base table indicates that its value will be rolled to the tick tables. In the tick tables the last stored value of the rollup column is saved from the base table. In addition, aggregated values of ‘sum’, ‘min’, ‘max’, ‘avg’, ‘var’ of the column are computed and stored in the tick tables. Queries for aggregation values are extremely fast because only a single record is read from the database to retrieve the aggregated data without scanning the tables to compute and get the result.


Location and Spatial Data

JaguarDB is the only database that supports both vector and raster spatial data. Vector spatial shapes include line, square, rectangle, circle, ellipse, triangle, sphere, ellipsoid, cone, cylinder, box, 3D line, 3D square, 3D rectangle, 3D circle, 3D ellipse, and 3D triangle. Raster spatial shapes include point, multipoint, linestring, multilinestring, polygon, multipolgon, and their 3D counterparts. Location-based data can be managed with regular tables. In a table containing spatial data types, the type of a column can have a spatial reference identifier (SRID). If no SRID is provided, the default value is zero, meaning it is a simple geometric coordinate system. In addition to the SRID of the column, the number of metrics associated with location point or a shape can be specified with the “metrics:” keyword. The following examples show how to create tables with spatial columns.
    create table if not exists geom ( key: a int, value: pt point(srid:4326), b int );
    create table if not exists geom2 ( key: a int, value: pt point(srid:wgs84), b int );
    create table if not exists geom3 ( key: a int, value: pt point, b int );
    create table dot ( key: a int, pt1 point, b int, pt2 point, value: c int, d int, pt3 point3d );
    create table cb ( key: a int, q1 cube, b int, q2 cube, value: c int, q3 cube );
    create table es ( key: a int, c ellipsoid, value: d int, e ellipse );
    create table linestr ( key: lsw linestring(srid:wgs84), a int, value: lss linestring );
    create table pol ( key: a int, value: po2 polygon, po3 polygon3d, tm datetime, ls linestring );
    create table mline ( key: a int, value: m multilinestring, m3 multilinestring3d );
    create table mpg ( key: a int, value: p multipolygon, p3 multipolygon3d );
    create table street ( key: a int, value: pt linestring(srid:wgs84,metrics:10), b int );
    create table base ( key: a int, value: pt point(srid:wgs84,metrics:20), char(32) );
The number of metrics is unlimited, as long as the storage space allows. Each metric has a length of 8 bytes, with default value of zero. The metrics are identified by mN, such as:
select col:m1, col:m3 from mytab where a=100 and col:x=200 and col:y=300;
Most shapes can be used to query with these functions:
  • within
  • contain
  • cover
  • coveredby
  • intersect
  • disjoint

Combining Time Series and Spatial Data

The following example demonstrates how a user can manage time series data and location-based data in one JaguarDB ‘rides’ table. The rides table is created by the following command:
    CREATE TABLE timeseries(5m,30m,1d,1M) rides (
     key:
     pickup_datetime datetimesec,
     dropoff_datetime datetimesec,
     driver_name char(16),
     rate_type char(8),
     payment_type char(1),
     value:
     passenger_count rollup int,
     trip_distance rollup float(8.2),
     pickup_location point(srid:wgs84),
     dropoff_location point(srid:wgs84),
     fare_amount rollup float(8.2),
     tip_amount rollup float(6.2),
     tolls_amount float(6.2),
     total_amount rollup float(8.2),144
    );
Here the ‘rides’ is the base table, and there are four tick tables created for ticks of five minutes, thirty minutes, one day, and one month. Each rollup column will generate five heap columns in the tick tables. Passenger pickup location and drop off location are represented by points having longitude and latitude coordinates in degrees. Data can be inserted by the following example:
insert into rides values ( '2021-02-11 09:22:12', '2021-02-11 09:50:42', 'DriverAHM', 
'REG', '1', '2', '48.6', point(122.036 37.7), point(122.385 37.622), '56.5', '10.5', 
'5.0', '72.0' );

insert into rides values ( '2021-02-11 09:32:12', '2021-02-11 09:58:42', 'DriverJHS', 
'HYP', '1', '3', '49.2', point(122.035 37.369), point(122.381 37.621), '73.5', '12.5', 
'5.8', '91.8' );

insert into rides values ( '2021-02-12 09:32:12', '2021-02-12 13:50:42', 'DriverAHM', 
'REG', '1', '2', '66.8', point(121.8864 37.3382 ), point(122.382 37.622), '96.1', 
'20.5', '8.0', '124.6' );
With the data we have, we can answer the following questions:
(1) How many rides took place on each day? 

select pickup_datetime as day, counter as rides from rides@1d where 
driver_name='*' and rate_type='*' and payment_type='*';

Answer:
day=[2021-02-11 00:00:00] rides=[2]
day=[2021-02-12 00:00:00] rides=[1]

(2) How many rides took place on the day of ‘2021-02-12’? 

select pickup_datetime as day, counter as rides from rides@1d where 
driver_name='*' and rate_type='*' and payment_type='*' and 
pickup_datetime='2021-02-11 00:00:00';

Answer:
day=[2021-02-11 00:00:00] rides=[2]

(3) What is the average fare amount? 

select avg( fare_amount::avg) avg_fare_mount from rides@1M where 
driver_name='*' and rate_type='*' and payment_type='*';

Answer:
avg_fare_mount=[75.366667]

(4) What is the average fare amount in February of year 2021? 

select pickup_datetime as month, fare_amount::avg avg_fare_mount from rides@1M 
where driver_name='*' and rate_type='*' and payment_type='*' and 
pickup_datetime='2021-02-01 00:00:00';

Answer:
month=[2021-02-01 00:00:00] avg_fare_mount=[75.3666666667]

(5) What is the average fare amount for each driver? 

select driver_name, avg( fare_amount::avg) avg_fare_mount from rides@1M where 
driver_name != '*' and rate_type='*' and payment_type='*' group by driver_name;

Answer:
driver_name=[DriverAHM] avg_fare_mount=[76.3]
driver_name=[DriverJHS] avg_fare_mount=[73.5]

(6) How many rides took place for each rate type? 

select rate_type, sum(counter) rides from rides@1M where rate_type != '*' and 
driver_name='*' and payment_type='*' group by rate_type;

Answer:
rate_type=[HYP] rides=[1.0]
ate_type=[REG] rides=[2.0]

(7) What are the monthly average trip distance for all drivers? 

select pickup_datetime as month, trip_distance::avg from rides@1M where 
rate_type='*' and payment_type='*' and driver_name='*';

Answer:
month=[2021-02-01 00:00:00] trip_distance::avg=[54.8666666667]

(8) What are the monthly average trip distance and maximum average distance 
for each driver? 

select driver_name driver, pickup_datetime as month, avg(trip_distance::avg) 
avg_distance, max(trip_distance::avg ) max_avg_distance from rides@1M where 
rate_type='*' and payment_type='*' and driver_name != '*' group by driver_name;

Answer:
driver=[DriverAHM] month=[2021-02-01 00:00:00] avg_distance=[57.7] max_avg_distance=[57.7]
driver=[DriverJHS] month=[2021-02-01 00:00:00] avg_distance=[49.2] max_avg_distance=[49.2]

(9) How many rides took place every 5 minutes for the day of '2021-02-11' ? 

select pickup_datetime time, counter rides from rides@5m where driver_name='*' 
and rate_type='*' and payment_type='*' and pickup_datetime >= '2021-02-11 
00:00::00' and pickup_datetime < '2021-02-12 00:00:00' ;

Answer:146
time=[2021-02-11 09:20:00] rides=[1]
time=[2021-02-11 09:30:00] rides=[1]

(10) How many rides on the day of '2021-02-11' originated from within 10 
kilometers of Sunnyvale, California in 30 minute buckets? 

select pickup_datetime as day from rides where distance(pickup_location, 
point( 122.035 37.369 ), 'center' ) < 18000;

(11) What is the average total amount by 5 minutes for the day of 2021-02-11? 

select pickup_datetime start5min, window(5m, pickup_datetime), 
avg(total_amount) avg_total_amount from rides
where date(pickup_datetime)='2021-02-11'
group by pickup_datetime;

The window function creates time windows of 5 minutes based on the column ‘pickup_datetime’. The average is taken in the 5 minute windows by grouping the windows.


Realtime Aggregation

JaguarDB enables data analysis by computing aggregated values in real-time within user-defined time windows. This means that statistical values of data columns can be obtained instantly, without the need for additional computations. The aggregated data is readily available, facilitating seamless real-time computing and analysis. Once you experience the efficiency and power of JaguarDB, you will never want to go back to traditional methods. Embrace the future of data analysis with JaguarDB.



Click here to quickly download JaguarDB software    (Total: 1186 downloads)


jaguardb.io     jaguardb.com