RuleScript
What is RuleScript?
RuleScript is a lightweight, domain-specific language (DSL) built for seamless data transformation and validation. Designed with simplicity in mind, RuleScript enables developers, analysts, and engineers to write clear, human-readable rules for cleaning and processing datasets, ensuring data quality across various workflows.
Key Benefits of RuleScript:
- Human-Readable Syntax: Rules are written in a simple, declarative format, making them easy to understand and maintain.
- Focused on Data Quality: Combines transformations and validations in a single syntax to ensure clean and accurate datasets.
- Portable Across Environments: Use RuleScript in CLI tools, APIs, or enterprise-grade systems.
- Flexible and Extensible: Supports a wide range of transformations, validation rules, and custom conditions for advanced workflows.
By separating the definition of rules from the execution environment, RuleScript enables users to focus on logic and let tools like RuleGenius handle the heavy lifting.
Note:
Using RuleScript requires a subscription to RuleGenius, our platform for processing RuleScript files. A free plan is available with limited features and usage quotas.
How to Use RuleScript
RuleScript is the foundation of RuleGenius, providing the language for defining and executing data rules. Here’s where and how RuleScript can be applied:
1. Command-Line Interface (CLI)
Execute RuleScript files locally with the RuleGenius CLI tool.
- How to Use:
Write your RuleScript rules in a.rs
file, compile to.rsx
, and run the CLI tool to process your data. - Use Cases:
- Quick, local processing for data analysts and developers.
- Integrate into CI/CD pipelines for automated data processing.
rg-compile rules.rs # this generates the rules.rsx binary
rg-run rules.rsx input.csv output.csv
2. API Integration
Access RuleScript functionality programmatically via the RuleGenius API. Whether you’re processing a full dataset or validating a single record, the API provides flexibility for your needs.
Example 1: Validating a CSV Payload
Submit a dataset and RuleScript ruleset for batch validation and transformation.
POST /api/v1/validate/my_rules.rsx
Content-Type: multipart/form-data
{
"file": "input.csv"
}
Response:
{
"status": "success",
"output_file": "output.csv",
"summary": {
"rows_processed": 1000,
"rows_with_errors": 5,
"validation_errors": [
{"row": 23, "field": "price", "error": "Price must be between 0 and 10000"}
]
}
}
Example 2: Validating a Single Record (JSON Payload)
Submit a single record for on-the-fly validation using JSON payloads.
POST /api/v1/validate/my_rules.rsx
Content-Type: application/json
{
"record": {
"full_name": "John Doe",
"price": -50,
"country_code": "US"
}
}
Response:
{
"status": "error",
"errors": [
{
"field": "price",
"error": "Price must be between 0 and 10000"
}
]
}
Use Cases for the API
- Batch Validation: Upload CSV files to validate and transform datasets at scale.
- Real-Time Validation: Validate individual records dynamically for applications like form submissions or automated data pipelines.
3. Hybrid Workflows with Pipelines
Chain multiple RuleScript rulesets for advanced workflows.
- How to Use:
Combine rulesets into a processing pipeline, with each step applying transformations or validations - Use Cases:
- Transform datasets from multiple sources into a unified format.
- Validate large datasets with multiple stages of rules.
rg-transform step1.rsx input.csv \
| rg-transform step2.rsx \
| rg-validate rules.rsx --output output.csv
rg-transform step1.rsx input.csv
: Applies the first set of transformations.rg-transform step2.rsx
: Applies additional transformations to the intermediate output.rg-validate rules.rsx
: Validates the transformed dataset according to the ruleset.
4. On-Premises Deployment (Enterprise)
Deploy RuleScript locally to meet compliance and privacy requirements.
- How to Use:
Deploy RuleGenius on your private servers or cloud containers to execute RuleScript rules securely within your environment. Keep your data private and compliant by avoiding transfers to third-party networks. - Use Cases:
- Process sensitive data within secure environments.
- Comply with regulations like GDPR or HIPAA.
Why Use RuleScript?
RuleScript simplifies the creation of data transformation and validation rules, focusing on what matters: data quality. Here’s why you should adopt RuleScript for your workflows:
- Intuitive Syntax: Easy for data analysts and developers to learn and use.
- Unified Language for Rules: Combine transformations and validations in one cohesive syntax.
- Reusable and Scalable: Write once, reuse rules across different datasets and environments.
- Free and Flexible Access: Start with our free plan and scale up as your needs grow with RuleGenius subscriptions.
RuleScript’s Syntax
File Extensions: .rs
and .rsx
(compiled)
This is how RuleScript looks like:
split "full_name" into "first_name", "last_name" by " ";
rename "price_usd" to "price";
validate "price": number, on error "Price must be numeric";
validate "country": required, in ["United States", "Canada", "Mexico"], on error "Invalid country";
Example 1: Transforming Data
Input Dataset:
full_name | price_usd | country_code |
---|---|---|
John Doe | 200 | US |
Jane Smith | -50 | CA |
RuleScript:
rename "price_usd" to "price";
split "full_name" into "first_name", "last_name" by " ";
map "country_code" [US: "United States", CA: "Canada", MX: "Mexico"];
Output Dataset:
first_name | last_name | price | country |
---|---|---|---|
John | Doe | 200 | United States |
Jane | Smith | -50 | Canada |
Example 2: Validating Data
Input Dataset:
full_name | price | country |
---|---|---|
John Doe | 200 | United States |
Jane Smith | -50 | Canada |
RuleScript:
validate "price": number, min 0, max 10000, on error "Price must be valid";
validate "country": required, in ["United States", "Canada", "Mexico"], on error "Invalid country";
Validation Feedback:
full_name | price | country | error |
---|---|---|---|
John Doe | 200 | United States | |
Jane Smith | -50 | Canada | Price must be valid |
Example 3: Combining Transformations and Validations
Input Dataset:
order_id | full_name | price_usd | country_code |
---|---|---|---|
001 | John Doe | 200 | US |
002 | Jane Smith | -50 | CA |
003 | Mary Johnson | 15000 | MX |
RuleScript:
rename "price_usd" to "price";
split "full_name" into "first_name", "last_name" by " ";
map "country_code" [US: "United States", CA: "Canada", MX: "Mexico"];
validate "price": number, min 0, max 10000, on error "Price must be valid";
validate "country": required, in ["United States", "Canada", "Mexico"], on error "Invalid country";
Output Dataset:
order_id | first_name | last_name | price | country | error |
---|---|---|---|---|---|
001 | John | Doe | 200 | United States | |
002 | Jane | Smith | -50 | Canada | Price must be valid |
003 | Mary | Johnson | 15000 | Mexico | Price must be valid |
Example 4: Conditional Validation
validate "discount": number, on error "Discount must be numeric", if "price" > 100
RuleScript Transformations
Transformations in RuleScript modify, clean, and reshape datasets. Below are descriptions of all available transformations.
1. Rename Field
Change the name of a column in your dataset. For example, rename "price_usd"
to "price"
:
rename "price_usd" to "price";
- Parameters:
- Required:
"source_field"
,"target_field"
- Required:
2. Split Field
Split the content of a column into multiple new columns based on a delimiter. For instance, splitting "full_name"
into "first_name"
and "last_name"
:
split "full_name" into "first_name", "last_name" by " ";
- Parameters:
- Required:
"field"
,target_fields
(at least one target) - Optional:
by
(defaults to" "
)
- Required:
3. Merge Fields
Combine the content of multiple columns into one. For example, merging "first_name"
and "last_name"
into "full_name"
with a space separator:
merge "first_name", "last_name" into "full_name" with " ";
- Parameters:
- Required:
fields
,"target_field"
- Optional:
with
(defaults to" "
)
- Required:
4. Map Values
Replace values in a column based on a predefined mapping. For example, mapping "US"
to "United States"
:
map "country_code" [US: "United States", CA: "Canada", MX: "Mexico"];
- Parameters:
- Required:
"field"
, mapping pairs (at least one mapping)
- Required:
5. Convert Data Type
Change a column’s data type, such as converting a string to an integer. For example:
convert "quantity" to integer;
- Parameters:
- Required:
"field"
,data_type
- Required:
6. Calculate New Field
Create a new column based on a formula or logic. For example, calculating a "total"
field:
calculate "total" as "price * quantity";
- Parameters:
- Required:
"target_field"
,expression
- Required:
7. Normalize Case
Standardize text to lowercase, uppercase, or title case. For instance, converting all text in "name"
to lowercase:
normalize "name" to lowercase;
- Parameters:
- Required:
"field"
,case_type
- Required:
8. Replace Values
Replace specific values in a column. For example, replacing NULL
with "N/A"
:
replace "status" null with "N/A";
- Parameters:
- Required:
"field"
,replacement_value
- Optional:
value_to_replace
(defaults toNULL
)
- Required:
9. Round Numbers
Round numeric values to a specific number of decimal places. For example, rounding "price"
to two decimal places:
round "price" to 2 decimal places;
- Parameters:
- Required:
"field"
,decimal_places
- Required:
10. Anonymize Data
Mask or anonymize sensitive information in a column. For example, anonymizing email addresses:
anonymize "email" with "****@domain.com";
- Parameters:
- Required:
"field"
- Optional:
with
(defaults to generic anonymization, such as replacing characters with asterisks)
- Required:
11. Drop Columns
Remove unnecessary columns from the dataset. For example, dropping "temp_field"
:
drop "temp_field";
- Parameters:
- Required:
"field"
- Required:
12. Impute Missing Values
Fill missing values in a column with a default, mean, or median value. For example, replacing missing values in "age"
with the average:
impute "age" missing with average;
- Parameters:
- Required:
"field"
,replacement_type
- Required:
13. Reorder Columns
Change the order of columns in the dataset. For example, reordering columns to "id", "name", "email"
:
reorder columns "id", "name", "email";
- Parameters:
- Required:
fields
(list of columns)
- Required:
14. Filter Rows
Remove rows that do not meet specific conditions. For example, keeping only rows where "age"
is greater than 18:
filter "age" > 18;
- Parameters:
- Required:
condition
- Required:
RuleScript Validation Functions
Validation functions ensure your data meets specified criteria. Below are all available functions with examples and parameters.
1. Required
Ensure a column is not empty or null. For example, ensuring "name"
is always present:
validate "name": required, on error "Name is required";
- Parameters:
- Required:
"field"
- Optional:
on error
- Required:
2. Data Type Validation
Check that a column matches the expected data type (e.g., number
, integer
, string
, date
, time
, datetime
):
validate "age": number, on error "Age must be numeric";
- Parameters:
- Required:
"field"
,data_type
- Optional:
on error
- Required:
3. Range Validation
Ensure numeric values fall within a range (min and max). For example:
validate "price": number, min 0, max 10000, on error "Price must be valid";
- Parameters:
- Required:
"field"
,min
,max
- Optional:
on error
- Required:
4. In List
Ensure a column’s value is part of a predefined list. For example:
validate "country": in ["United States", "Canada", "Mexico"], on error "Invalid country";
- Parameters:
- Required:
"field"
,list
- Optional:
on error
- Required:
5. Greater Than (>), Greater or Equal (>=)
Ensure values are above a specific threshold. For example:
validate "discount": > 0, on error "Discount must be positive";
validate "price": >= 0, on error "Price must be non-negative";
- Parameters:
- Required:
"field"
, threshold value - Optional:
on error
- Required:
6. Less Than (<), Less or Equal (<=)
Ensure values are below a specific threshold. For example:
validate "age": < 18, on error "Age must be below 18";
validate "discount": <= 50, on error "Discount must be 50% or less";
- Parameters:
- Required:
"field"
, threshold value - Optional:
on error
- Required:
7. Equal and Not Equal
Ensure values match or don’t match specific conditions. For example:
validate "status": equal "Active", on error "Status must be active";
validate "status": not in ["Inactive", "Suspended"], on error "Invalid status";
- Parameters:
- Required: “field”, value or condition
- Optional: on error
Get Started with RuleScript Today!
Sign Up for Free
Experience the power of RuleScript with our free plan, perfect for small projects or testing new ideas. Gain access to essential features and start improving your data quality today.
Sign Up Now
Download the CLI Tool
Easily process datasets locally or in pipelines with the RuleScript CLI tool. It’s quick to install and perfect for on-the-go data transformations and validations.
Download the CLI Tool
Explore the API Documentation
Learn how to seamlessly integrate RuleScript into your systems using our powerful API. Automate workflows, validate data dynamically, and scale effortlessly.
Explore the API
Request an Enterprise Demo
Discover how RuleScript and RuleGenius can transform your enterprise data workflows. See how it fits your needs for security, scalability, and compliance.
Request a Demo