Mapping tables and attributes
Overview
The Map process is the first step in the Stitch process for data unification. Once the Dataflow is successful, map the attributes of imported data with their semantic labels. In the map process, you need to do semantic mapping to get the data into a Modern Data Stack Platform (MDSP) and prepare it for the Match process. The Map stage consists of four steps:
- Table selection: Identify and select those tables for mapping which have complete information about your customers.
- Attribute selection: Identify and select the most accurate attributes for each table.
- Primary key and semantic type selection: Identify and select a unique primary key for each table and identify the semantic type for each attribute.
- Duplication of Map tables: Identify the duplicated records and cleanse your data by eliminating redundancy.
To Map tables and attributes
- In the left pane, go to the Resolve menu and click Stitch.
By default, the Map window appears.
- Click Select tables.
- Select the different tables and include all required attributes. You can search on keywords across all tables and attributes that you want to map.
❕ Note: At least two tables are needed to match and stitch.
- Select Apply to confirm your changes. The selected tables and attributes are displayed on the Map page.
Select primary key, candidate key, and semantic type for attributes
Follow the below steps to start the Map process:
- Select the Primary key from the drop-down list.
- Check Select candidate key.
- Select the Candidate key from the drop-down list.
- In the Review mapped field, select the semantic type for attributes.
❕ Note: The system automatically maps the semantic labels based on its inbuilt algorithms. The system displays all mapped attributes in the table above and all unmapped attributes shown in the table below.For attributes that aren't automatically mapped to a semantic type, are listed under Define the data in the unmapped fields. You can select a semantic mapping for the unmapped attributes to move these to the mapped table, or leave them unmapped. Select the attribute that uniquely identifies a record. It should not include duplicate values, missing values, or null values.
- To configure various data cleansing techniques like converting all texts to lower/upper case, removing any spaces between strings, select the Advanced settings. Default normalizations are added for the first name, last name, email address, and phone number fields to reduce data redundancy and eliminate undesirable characteristics.
As per the best practice, the normalization is applicable for the below Attributes:
First Name -Text to Pascal, Left Trim, Right Trim, Regex Replace ([^a-zA-Z ])
Last Name -Text to Pascal, Left Trim, Right Trim, Regex Replace ([^a-zA-Z ])
Email Address - Text to lower case, Remove WhiteSpace, Left Trim, Right Trim
Phone Number - Left Trim, Right Trim, Regex Replace [^0-9]+
Tip
For the DateTime fields which are mapped to Calendar.Date, Last.Modified.Date, Person.BirthDate in Map - advanced settings under Stitch have the "Default DateTime" field selected by default. With this feature, the same DateTime format as set in the instance settings can be seen for the DateTime fields in the profile card. If the user wishes to see the profile card with DateTime as ingested from the source, the user can unselect the option in Map > Advanced settings.
- Click on each table and complete the mapping for the several types of required fields.
- Click Save.
- Once the application displays Saved successfully, click Run to start the mapping process.
❕ Note: If you run the map for the first time, the system takes it as a full run. For subsequent runs, you can choose between a Full run or an Incremental run. Incremental run adds those data which are refreshed after the last run. You can get faster result after the Incremental run.
- To perform the Incremental run, select the date attribute in Last.Modified.Date type.
- Choose an appropriate option from the drop-down to run the Map process.
- If necessary, do the following:
If you want to | Then |
---|---|
Discard all the changes from the last save | Click Discard Changes. |
Cancel the Run that is in progress | Click Cancel Run. |
Add or remove the attributes | Click Edit. |
Discard advanced settings | Click Clear advanced settings. |
See the run history of the mapping process | Click Run History. |
Tip
SkyPoint platform provides an Intelligent mapping feature that automatically determines the associated semantic label for each attribute of a table. Intelligent mapping provides smart prediction of semantics, saves time, and improves accuracy.
- To activate, Turn on the Intelligent mapping.
Duplicating the record details
The Deduplication process identifies the duplicated records and cleanses your data by eliminating redundant data. For example, there are 100 individual data records, but 50 out of the 100 records are duplicates. To unify the data, you must remove the duplicates so that only 50 unique records will pass to the next step in the process.
To identify and resolve the duplicated records
- Go to Resolve > Stitch > Map.
- In the Duplicated records details section, select Set tables.
❕ Note: If deduplication rules are already set then select Edit to modify the tables.
- In the Duplicate preferences pane, select the Tables.
Item | Description |
---|---|
Column Type | Option to select the column type to figure out the duplicated data. |
Record to keep | Allows you to identify and select most populated attributes fields, most recent, and least recent records. |
Based on field(s) | Allows you to select the most populated attributes fields. |
- Choose the criteria for deduplicate preference from Column Type > Record to Keep > Based on field (s).
- Select Done.
- Click Save > Run.
Now, you can view the duplicated record details. These tables can be found on the Databases page of the Map lists.
- If you want to modify the Deduplicate preferences, click the Edit button and select Done to apply your changes.