Weeding the Data Garden: How Rulex Platform Cultivates Quality

Weeding the Data Garden: how Rulex Platform Cultivates Quality

June 4, 2024

Whether it’s an out-of-range value or an incorrect format, the quality of data is fundamental to any data-reliant process and significantly impacts the results, despite often being overlooked. Imagine an enterprise which decides to undergo a large end-to-end business transformation program, where the final aim is to switch to the newest APS featuring so many fancy capabilities. If the data provided to that software is not consistent and accurate, the results will be adversely impacted.

So, we need to monitor quality in order to ensure adequate accuracy; and to monitor quality we need to… define what Data Quality is. In fact this is rather an “umbrella term” to refer to different issues in the data: Accuracy, Completeness, Consistency, Timeliness, Validity and Uniqueness are some Data Quality dimensions. You can find more information (and more dimensions!) by googling around, so here we rather focus on the solution, which, as the problem, is also multi-faceted.

Rulex Platform provides different capabilities and approaches to solve different Data Quality issues in the same way as a gardener uses various tools and practices to uproot the weeds and make his garden bloom.

Weeding the Data Garden: How Rulex Platform Cultivates Quality

When it comes to harmonizing fragmented data, handling missing values and duplicates, and formatting errors or outliers, Rulex Platform can quickly spot and correct the issue.

A typical Rulex flow foresees these cleansing activities as one of the first actions performed on the raw and dirty dataset. Specific tasks are available which make it simple for any citizen developer to cleanse their data, such as:

  • Fill & Clean: which imputes missing data with fixed or dynamic values
  • Data Manager: which spots and dismisses duplicate rows with a single click

And if these are not enough you can leverage various other features such as:

  • An advanced join capability to merge different datasets based, for example, on string similarity
  • Statistical or textual Data Manager functions, which deal with outliers or incorrect formats
  • …and much more!

Albeit the above approaches have proven useful with many basic issues, there are some cases where a data value seems pretty normal and yet hides an inconsistency.

Unmask inconsistency with eXplainable AI

Among all these dimensions, consistency is one of the most difficult to deal with. A target attribute is considered “consistent” if it changes in accordance with its related attributes; i.e., its values change consistently when the context changes.

The table below illustrates an example of inconsistency (guess why!):

Name
Age
Married
John
28
Yes
Mike
32
No
Paul
5
Yes
Brenda
54
Yes

Also, sometimes you know that a subset of your data is inconsistent, but you don’t have the proper rules to correct it.

Or you have some basic rules, but there are so many exceptions that the final correct values can hardly be identified.

Rulex approaches all the above scenarios with a disruptive solution called Robotic Data Corrections (RDC), which seamlessly provides correction proposals to inconsistent data.

Behind the magic there is a proprietary eXplainable AI algorithm called Logic Learning Machine, capable of inferring a ruleset according to which proposals are devised. With this approach, the user simply accepts or rejects recommendations according to their domain knowledge. The algorithm integrates this new knowledge into successive iterations. After four to five iterations, the accuracy is usually close to 99%.

In addition, RDC catches any new issues in data quality associated to material “phasing in”: at a steady state, minimum effort is required to attain the highest levels of accuracy.

But as we mentioned, the realm of Data Quality is complex and the issue types are diverse: sometimes dependencies from driving attributes involve mathematical formulations, or sometimes even if you do have a settled ruleset, it is not easy to update it. Or maybe the dependency between rules is too complex to manage.

Luckily, the realm of the solutions provided by Rulex is also diverse.

Ignite your rules with the Rule Engine

Rulex provides a task which allows any citizen developer to write their rules with a simple syntax in a simple spreadsheet, import this rule file, and apply the rules to a dataset. This empowering task is called the “Rule Engine”.

The beauty of this approach is that any existing rules can be coded in the task: from the simplest rules to rules involving complex conditions or output values resulting from complex mathematical or logical functions. Also, the whole process of ensuring data quality is completely in the hands of the citizen developer, without needing to resort to skilled expertise to modify the rules or create new ones (definitely shortening the time-to-value).

Finally, our Product Team is working on a solution for those unsure if all the rules are properly configured.

Sharpen your rules with the Rule Enhancer

The Rule Enhancer is an innovative task which refines existing rules: think of it as a tuning tool. It requires a data (sub)set which contains clean and accurate values (the so-called “ground truth”), used to adjust the rules. It also requires some sort of performance criterion (such as the F1 score); as a result, fine-tuned parameters are provided for each rule. If you are interested just hang in there a bit longer: the task will be released in the short!

Sharpen your rules with the Rule Enhancer

Let your data bloom

These multiple approaches together constitute the basis for a 360 degree solution that reaches top accuracy levels, and which can be applied in a comprehensive Data Quality pipeline, so that any kind of issue can be tackled and solved. And what’s more: the implementation can be proficiently managed by any citizen developer who well understands the underlying data.
Rulex Platform provides all the solutions needed to make your knowledge blossom into colourful, accurate data.

Discover Rulex Platform’s data quality solutions

Rulex Platform

Senior Platform Solutions Architect • Platform Solutions

Related Posts

Superior Data Performance: Rulex Outperforms Pandas

Superior Data Performance: Rulex Outperforms Pandas

Anyone who works with data knows how crucial performance is, especially when performing complex data processing and data transformation operations on medium to large datasets. At Rulex, we understand this need very well, which is why we have devoted a considerable...

Business rule engine: who rules the rules?

Business rule engine: who rules the rules?

Have you received a discount from your favorite clothing brand? Business rules were probably involved in the decision-making process. Often brands set business rules that award discounts every time a certain value is reached by the customer. But who defines and...