RuleScript

What is RuleScript?

RuleScript is a lightweight, domain-specific language (DSL) built for seamless data transformation and validation. Designed with simplicity in mind, RuleScript enables developers, analysts, and engineers to write clear, human-readable rules for cleaning and processing datasets, ensuring data quality across various workflows.

Key Benefits of RuleScript:

  • Human-Readable Syntax: Rules are written in a simple, declarative format, making them easy to understand and maintain.
  • Focused on Data Quality: Combines transformations and validations in a single syntax to ensure clean and accurate datasets.
  • Portable Across Environments: Use RuleScript in CLI tools, APIs, or enterprise-grade systems.
  • Flexible and Extensible: Supports a wide range of transformations, validation rules, and custom conditions for advanced workflows.

By separating the definition of rules from the execution environment, RuleScript enables users to focus on logic and let tools like RuleGenius handle the heavy lifting.

Note:
Using RuleScript requires a subscription to RuleGenius, our platform for processing RuleScript files. A free plan is available with limited features and usage quotas.

How to Use RuleScript

RuleScript is the foundation of RuleGenius, providing the language for defining and executing data rules. Here’s where and how RuleScript can be applied:

1. Command-Line Interface (CLI)

Execute RuleScript files locally with the RuleGenius CLI tool.

  • How to Use:
    Write your RuleScript rules in a .rs file, compile to .rsx, and run the CLI tool to process your data.
  • Use Cases:
    • Quick, local processing for data analysts and developers.
    • Integrate into CI/CD pipelines for automated data processing.
rg-compile rules.rs # this generates the rules.rsx binary
rg-run rules.rsx input.csv output.csv

Download the CLI Tool


2. API Integration

Access RuleScript functionality programmatically via the RuleGenius API. Whether you’re processing a full dataset or validating a single record, the API provides flexibility for your needs.

Example 1: Validating a CSV Payload

Submit a dataset and RuleScript ruleset for batch validation and transformation.

POST /api/v1/validate/my_rules.rsx
Content-Type: multipart/form-data

{
    "file": "input.csv"
}

Response:

{
    "status": "success",
    "output_file": "output.csv",
    "summary": {
        "rows_processed": 1000,
        "rows_with_errors": 5,
        "validation_errors": [
            {"row": 23, "field": "price", "error": "Price must be between 0 and 10000"}
        ]
    }
}
Example 2: Validating a Single Record (JSON Payload)

Submit a single record for on-the-fly validation using JSON payloads.

POST /api/v1/validate/my_rules.rsx
Content-Type: application/json

{
    "record": {
        "full_name": "John Doe",
        "price": -50,
        "country_code": "US"
    }
}

Response:

{
    "status": "error",
    "errors": [
        {
            "field": "price",
            "error": "Price must be between 0 and 10000"
        }
    ]
}
Use Cases for the API
  • Batch Validation: Upload CSV files to validate and transform datasets at scale.
  • Real-Time Validation: Validate individual records dynamically for applications like form submissions or automated data pipelines.

Explore the API Documentation


3. Hybrid Workflows with Pipelines

Chain multiple RuleScript rulesets for advanced workflows.

  • How to Use:
    Combine rulesets into a processing pipeline, with each step applying transformations or validations
  • Use Cases:
    • Transform datasets from multiple sources into a unified format.
    • Validate large datasets with multiple stages of rules.
rg-transform step1.rsx input.csv \
  | rg-transform step2.rsx \
  | rg-validate rules.rsx --output output.csv
  • rg-transform step1.rsx input.csv: Applies the first set of transformations.
  • rg-transform step2.rsx: Applies additional transformations to the intermediate output.
  • rg-validate rules.rsx: Validates the transformed dataset according to the ruleset.

Learn More About Pipelines


4. On-Premises Deployment (Enterprise)

Deploy RuleScript locally to meet compliance and privacy requirements.

  • How to Use:
    Deploy RuleGenius on your private servers or cloud containers to execute RuleScript rules securely within your environment. Keep your data private and compliant by avoiding transfers to third-party networks.
  • Use Cases:
    • Process sensitive data within secure environments.
    • Comply with regulations like GDPR or HIPAA.

Request an Enterprise Demo


Why Use RuleScript?

RuleScript simplifies the creation of data transformation and validation rules, focusing on what matters: data quality. Here’s why you should adopt RuleScript for your workflows:

  • Intuitive Syntax: Easy for data analysts and developers to learn and use.
  • Unified Language for Rules: Combine transformations and validations in one cohesive syntax.
  • Reusable and Scalable: Write once, reuse rules across different datasets and environments.
  • Free and Flexible Access: Start with our free plan and scale up as your needs grow with RuleGenius subscriptions.

RuleScript’s Syntax

File Extensions: .rs and .rsx (compiled)

This is how RuleScript looks like:

split "full_name" into "first_name", "last_name" by " ";
rename "price_usd" to "price";
validate "price": number, on error "Price must be numeric";
validate "country": required, in ["United States", "Canada", "Mexico"], on error "Invalid country";

Example 1: Transforming Data

Input Dataset:

full_nameprice_usdcountry_code
John Doe200US
Jane Smith-50CA

RuleScript:

rename "price_usd" to "price";
split "full_name" into "first_name", "last_name" by " ";
map "country_code" [US: "United States", CA: "Canada", MX: "Mexico"];

Output Dataset:

first_namelast_namepricecountry
JohnDoe200United States
JaneSmith-50Canada

Example 2: Validating Data

Input Dataset:

full_namepricecountry
John Doe200United States
Jane Smith-50Canada

RuleScript:

validate "price": number, min 0, max 10000, on error "Price must be valid";
validate "country": required, in ["United States", "Canada", "Mexico"], on error "Invalid country";

Validation Feedback:

full_namepricecountryerror
John Doe200United States
Jane Smith-50CanadaPrice must be valid

Example 3: Combining Transformations and Validations

Input Dataset:

order_idfull_nameprice_usdcountry_code
001John Doe200US
002Jane Smith-50CA
003Mary Johnson15000MX

RuleScript:

rename "price_usd" to "price";
split "full_name" into "first_name", "last_name" by " ";
map "country_code" [US: "United States", CA: "Canada", MX: "Mexico"];

validate "price": number, min 0, max 10000, on error "Price must be valid";
validate "country": required, in ["United States", "Canada", "Mexico"], on error "Invalid country";

Output Dataset:

order_idfirst_namelast_namepricecountryerror
001JohnDoe200United States
002JaneSmith-50CanadaPrice must be valid
003MaryJohnson15000MexicoPrice must be valid

Example 4: Conditional Validation

validate "discount": number, on error "Discount must be numeric", if "price" > 100

RuleScript Transformations

Transformations in RuleScript modify, clean, and reshape datasets. Below are descriptions of all available transformations.


1. Rename Field

Change the name of a column in your dataset. For example, rename "price_usd" to "price":

rename "price_usd" to "price";
  • Parameters:
    • Required: "source_field", "target_field"

2. Split Field

Split the content of a column into multiple new columns based on a delimiter. For instance, splitting "full_name" into "first_name" and "last_name":

split "full_name" into "first_name", "last_name" by " ";
  • Parameters:
    • Required: "field", target_fields (at least one target)
    • Optional: by (defaults to " ")

3. Merge Fields

Combine the content of multiple columns into one. For example, merging "first_name" and "last_name" into "full_name" with a space separator:

merge "first_name", "last_name" into "full_name" with " ";
  • Parameters:
    • Required: fields, "target_field"
    • Optional: with (defaults to " ")

4. Map Values

Replace values in a column based on a predefined mapping. For example, mapping "US" to "United States":

map "country_code" [US: "United States", CA: "Canada", MX: "Mexico"];
  • Parameters:
    • Required: "field", mapping pairs (at least one mapping)

5. Convert Data Type

Change a column’s data type, such as converting a string to an integer. For example:

convert "quantity" to integer;
  • Parameters:
    • Required: "field", data_type

6. Calculate New Field

Create a new column based on a formula or logic. For example, calculating a "total" field:

calculate "total" as "price * quantity";
  • Parameters:
    • Required: "target_field", expression

7. Normalize Case

Standardize text to lowercase, uppercase, or title case. For instance, converting all text in "name" to lowercase:

normalize "name" to lowercase;
  • Parameters:
    • Required: "field", case_type

8. Replace Values

Replace specific values in a column. For example, replacing NULL with "N/A":

replace "status" null with "N/A";
  • Parameters:
    • Required: "field", replacement_value
    • Optional: value_to_replace (defaults to NULL)

9. Round Numbers

Round numeric values to a specific number of decimal places. For example, rounding "price" to two decimal places:

round "price" to 2 decimal places;
  • Parameters:
    • Required: "field", decimal_places

10. Anonymize Data

Mask or anonymize sensitive information in a column. For example, anonymizing email addresses:

anonymize "email" with "****@domain.com";
  • Parameters:
    • Required: "field"
    • Optional: with (defaults to generic anonymization, such as replacing characters with asterisks)

11. Drop Columns

Remove unnecessary columns from the dataset. For example, dropping "temp_field":

drop "temp_field";
  • Parameters:
    • Required: "field"

12. Impute Missing Values

Fill missing values in a column with a default, mean, or median value. For example, replacing missing values in "age" with the average:

impute "age" missing with average;
  • Parameters:
    • Required: "field", replacement_type

13. Reorder Columns

Change the order of columns in the dataset. For example, reordering columns to "id", "name", "email":

reorder columns "id", "name", "email";
  • Parameters:
    • Required: fields (list of columns)

14. Filter Rows

Remove rows that do not meet specific conditions. For example, keeping only rows where "age" is greater than 18:

filter "age" > 18;
  • Parameters:
    • Required: condition

RuleScript Validation Functions

Validation functions ensure your data meets specified criteria. Below are all available functions with examples and parameters.


1. Required

Ensure a column is not empty or null. For example, ensuring "name" is always present:

validate "name": required, on error "Name is required";
  • Parameters:
    • Required: "field"
    • Optional: on error

2. Data Type Validation

Check that a column matches the expected data type (e.g., number, integer, string, date, time, datetime):

validate "age": number, on error "Age must be numeric";
  • Parameters:
    • Required: "field", data_type
    • Optional: on error

3. Range Validation

Ensure numeric values fall within a range (min and max). For example:

validate "price": number, min 0, max 10000, on error "Price must be valid";
  • Parameters:
    • Required: "field", min, max
    • Optional: on error

4. In List

Ensure a column’s value is part of a predefined list. For example:

validate "country": in ["United States", "Canada", "Mexico"], on error "Invalid country";
  • Parameters:
    • Required: "field", list
    • Optional: on error

5. Greater Than (>), Greater or Equal (>=)

Ensure values are above a specific threshold. For example:

validate "discount": > 0, on error "Discount must be positive";
validate "price": >= 0, on error "Price must be non-negative";
  • Parameters:
    • Required: "field", threshold value
    • Optional: on error

6. Less Than (<), Less or Equal (<=)

Ensure values are below a specific threshold. For example:

validate "age": < 18, on error "Age must be below 18";
validate "discount": <= 50, on error "Discount must be 50% or less";
  • Parameters:
    • Required: "field", threshold value
    • Optional: on error

7. Equal and Not Equal

Ensure values match or don’t match specific conditions. For example:

validate "status": equal "Active", on error "Status must be active";
validate "status": not in ["Inactive", "Suspended"], on error "Invalid status";
  • Parameters:
    • Required: “field”, value or condition
    • Optional: on error

Get Started with RuleScript Today!

Sign Up for Free

Experience the power of RuleScript with our free plan, perfect for small projects or testing new ideas. Gain access to essential features and start improving your data quality today.
Sign Up Now


Download the CLI Tool

Easily process datasets locally or in pipelines with the RuleScript CLI tool. It’s quick to install and perfect for on-the-go data transformations and validations.
Download the CLI Tool


Explore the API Documentation

Learn how to seamlessly integrate RuleScript into your systems using our powerful API. Automate workflows, validate data dynamically, and scale effortlessly.
Explore the API


Request an Enterprise Demo

Discover how RuleScript and RuleGenius can transform your enterprise data workflows. See how it fits your needs for security, scalability, and compliance.
Request a Demo