Motivation
Ensuring data quality is of great importance for many use cases. datajudge
seeks to make this convenient.
datajudge
allows for the expression of expectations held against data stored in databases. In particular, it allows for comparing different DataSource
s. Yet, it also comes with functionalities to compare data from a single DataSource
to fixed reference values derived from explicit domain knowledge.
Not trying to reinvent the wheel, datajudge
relies on pytest
to execute the data expectations.
Comparisons between DataSources
The data generating process can be obscure for a variety of reasons. In such scenarios one might ask the questions of
Has the data ‘changed’ over time?
Was the transformation of the data successful?
In both cases one might want to compare different data – either from different points in time or from different transformation steps – to each other.
Why not Great Expectations?
The major selling point is to be able to conveniently express expectations between different DataSource
s. Great Expectations, in contrast, focuses on expectations against a single DataSource
.
Moreover, some users have pointed out the following advantages:
lots of ‘query writing’ is taken care of by having tailored
Constraint
seasier and faster onboarding
assertion messages with counterexamples and other context information, speeding up the data debugging process