Examples

The following examples illustrate some of the features of execsql.

Example 1: Use Temporary Queries to Select and Summarize Data in Access

This example illustrates a script that makes use of several temporary queries to select and summarize data, and a final query that prepares the data for export or further use. The SQL in this example is specific to MS-Access.

-- --------------------------------------------------------------------
-- Get result records that meet specific selection criteria.
-- --------------------------------------------------------------------
create temporary view v_seldata as
select
  smp.sys_sample_code, rs.test_surrogate_key,
  rs.cas_rn, tst.lab_anl_method_name,
  iif(rs.detect_flag='N', rs.method_detection_limit, rs.result_value) as conc,
  rs.detect_flag='Y' as detected,
  rs.lab_qualifiers like '*J*' as estimated,
  iif(rs.detect_flag='N', rs.detection_limit_unit, rs.result_unit) as unit
from
  ((((dt_result as rs
  inner join dt_test as tst on tst.test_surrogate_key=rs.test_surrogate_key)
  inner join dt_sample as smp on smp.sys_sample_code=tst.sys_sample_code)
  inner join dt_field_sample as fs on fs.sys_sample_code=smp.sys_sample_code)
  inner join dt_location as loc on loc.sys_loc_code=fs.sys_loc_code)
  inner join rt_analyte as anal on anal.cas_rn=rs.cas_rn
where
  (loc.loc_name like 'SG*' or loc.loc_name like 'SC*')
  and smp.sample_type_code='N'
  and smp.sample_matrix_code='SE'
  and anal.analyte_type in ('ABN', 'PEST', 'PCB', 'LPAH', 'HPAH')
  and rs.reportable_result='Yes'
  and not (rs.result_value is null and rs.method_detection_limit is null);

-- --------------------------------------------------------------------
-- Summarize by sample, taking nondetects at half the detection limit.
-- --------------------------------------------------------------------
create temporary view v_samp as
select
  sys_sample_code, cas_rn, lab_anl_method_name,
  Avg(iif(detected, conc, conc/2.0)) as concentration,
  Max(iif(detected is null, 0, detected)) as detect,
  Min(iif(estimated is null, 0, estimated)) as estimate,
  unit
from
  v_seldata
group by
  sys_sample_code, cas_rn, lab_anl_method_name, unit;

-- --------------------------------------------------------------------
-- Pull in sample location and date information, decode analyte,
-- and reconstruct qualifiers.
-- --------------------------------------------------------------------
select
  loc.loc_name, fs.sample_date, fs.start_depth, fs.end_depth, fs.depth_unit,
  smp.sample_name, anal.chemical_name, dat.lab_anl_method_name,
  iif(dat.detect, concentration, concentration/2.0) as conc,
  (iif(detect, "", "U") & iif(estimate, "J", "")) as qualifiers,
  unit
from
  (((v_samp as dat
  inner join dt_sample as smp on dat.sys_sample_code=smp.sys_sample_code)
  inner join dt_field_sample as fs on fs.sys_sample_code=smp.sys_sample_code)
  inner join dt_location as loc on loc.sys_loc_code=fs.sys_loc_code)
  inner join rt_analyte as anal on anal.cas_rn=dat.cas_rn;

During the execution of this script with Access, the temporary queries will be created in the database. When the script concludes, the temporary queries will be removed. Nothing except the data itself need be kept in the database to use a script like this one.

Example 2: Execute a Set of QA Queries and Capture the Results

This example illustrates a script that creates several temporary queries to check the codes that are used in a set of staging tables against the appropriate dictionary tables, and, if there are unrecognized codes, writes them out to a text file.

create temporary view qa_pty1 as
select distinct stage_party.party_type
from   stage_party
       left join e_partytype
         on stage_party.party_type=e_partytype.party_type
where  e_partytype.party_type is null;

-- !x! if(hasrows(qa_pty1))
-- !x!   write "Unrecognized party types:" to Staging_QA_!!$DATE_TAG!!.txt
-- !x!   export qa_pty1 append to Staging_QA_!!$DATE_TAG!!.txt as tab
-- !x! endif

create temporary view qa_prop1 as
select distinct stage_property.property_type
from   stage_property
       left join e_propertytype
         on stage_property.property_type=e_propertytype.property_type
where  e_propertytype.property_type is null;

-- !x! if(hasrows(qa_prop1))
-- !x!   write "Unrecognized property types:" to Staging_QA_!!$DATE_TAG!!.txt
-- !x!   export qa_prop1 append to Staging_QA_!!$DATE_TAG!!.txt as tab
-- !x! endif

create temporary view qa_partyprop1 as
select distinct stage_partyprop.property_rel
from   stage_partyprop
       left join e_partyproprel
         on stage_partyprop.property_rel=e_partyproprel.property_rel
where  e_partyproprel.property_rel is null;

-- !x! if(hasrows(qa_partyprop1))
-- !x!   write "Unrecognized party-property relationship types:" to Staging_QA_!!$DATE_TAG!!.txt
-- !x!   export qa_partyprop1 append to Staging_QA_!!$DATE_TAG!!.txt as tab
-- !x! endif

Example 3: Execute a Set of QA Queries and Display the Results with a Prompt

This example illustrates a script that compiles the results of several QA queries into a single temporary table, then displays the temporary table if it has any rows (i.e., any errors were found), and prompts the user to cancel or continue the script.

create temporary table qa_results (
  table_name varchar(64) not null,
  column_name varchar(64) not null,
  severity varchar(20) not null,
  data_value varchar(255) not null,
  description varchar(255) not null,
  time_check_run datetime not null
  );

insert into qa_results
  (table_name, column_name, severity, data_value, description, time_check_run)
select distinct
       'stage_party',
       'party_type',
       'Fatal',
       stage_party.party_type,
       'Unrecognized party type',
       cast('!!$CURRENT_TIME!!' as datetime)
from   stage_party
       left join e_partytype
         on stage_party.party_type=e_partytype.party_type
where  e_partytype.party_type is null;

insert into qa_results
  (table_name, column_name, severity, data_value, description, time_check_run)
select distinct
      'stage_property',
      'property_type',
      'Fatal',
      stage_property.property_type,
      'Unrecognized property type',
      cast('!!$CURRENT_TIME!!' as datetime)
from  stage_property
      left join e_propertytype
        on stage_property.property_type=e_propertytype.property_type
where e_propertytype.property_type is null;

insert into qa_results
  (table_name, column_name, severity, data_value, description, time_check_run)
select distinct
      'stage_partyprop',
      'property_rel',
      'Fatal',
      stage_partyprop.property_rel,
      'Unrecognized property relationship type',
      cast('!!$CURRENT_TIME!!' as datetime)
from  stage_partyprop
      left join e_partyproprel
        on stage_partyprop.property_rel=e_partyproprel.property_rel
where  e_partyproprel.property_rel is null;

-- !x! if(hasrows(qa_results)) { prompt message "Cancel if there are any fatal errors." display qa_results }

Example 4: Include a File if it Exists

This example illustrates how a script file can be modified by inclusion of an additional script only if that script file exists. This might be used when a general-purpose script is used to process data sets, and when some special data-set-specific processing is needed, that processing is coded in a separate script file, which is read into the main script only if it exists.

Each data set to be processed is identified by a unique name, which is defined with a SUB command in a script that is also read into the main script. The definition of the data set name might look like this, in a file named ds_name.sql:

-- !x! sub DS_NAME ALS4110-52

The main script then would look like this:

-- !x! include ds_name.sql
-- !x! if(file_exists(!!DS_NAME!!_fixes.sql)) { include !!DS_NAME!!_fixes.sql }

Example 5: Include a File if a Table Exists

Similar to Example 4, this example illustrates how a script file can be included if a database table exists. This might be used when carrying out quality assurance checks of data sets that have optional components. In this case, if an optional component has been loaded into a staging table, the script to check that component will be included.

-- !x! if(table_exists(staging.bioaccum_samp)) { include qa_bioaccumsamp.sql }

Example 6: Looping

Although execsql does not have any metacommands specifically for looping through groups of SQL statements or metacommands, the IF metacommand can be used with either the INCLUDE or EXECUTE SCRIPT metacommands to perform looping. Commands to be executed within a loop must be in a separate script (either in a separate file if the INCLUDE metacommand is used, or in a script block defined with the BEGIN/END SCRIPT metacommands), and that script should end with another INCLUDE or EXECUTE SCRIPT metacommand to continue the loop, or should forego re-running itself again to exit the loop.

This approach implements tail recursion. Either a single-line IF metacommand can be used, as shown here, or the script’s recursive invocation of itself can be contained within a block IF statement.

A script to control a loop would invoke the inner loop script as follows:

-- !x! write "Before looping starts, we can do some stuff."
-- !x! include loop_inner.sql
-- !x! write "After looping is over, we can do more stuff."

In this example, the inner part of the loop is contained in a script file named loop_inner.sql. The inner loop script should have a structure like:

-- !x! write "Loop iteration number !!$counter_1!!"
-- !x! prompt ask "Do you want to loop again?" sub loop_again
-- !x! if(equals("!!loop_again!!", "Yes")) { include loop_inner.sql }

Termination of the loop may be controlled by some data condition instead of by an interactive prompt to the user. For example, you could loop for as many times as there are rows in a table by using the SUBDATA metacommand to get a count of all of the rows in a table, and then use the IF(EQUALS()) conditional test to terminate the loop when a counter variable equals the number of rows in the table.

Every loop iteration increases the size of the script in memory, so execsql deallocates the memory used for script commands that have already been executed, to minimize the possibility of an out-of-memory error.

To iterate is human, to recurse divine.
– L Peter Deutsch

Example 7: Nested Variable Evaluation

This example illustrates nested evaluation of substitution variables, using scripts that print out all of the substitution variables that are assigned with the “-a” command-line option.

Because there may be an indefinite number of command-line variable assignments, a looping technique is used to evaluate them all. The outer level script that initiates the loop is simply:

-- !x! include arg_vars_loop.sql

The script that is called, arg_vars_loop.sql, is:

1
2
3
4
5
 -- !x! sub argvar $ARG_!!$counter_1!!
 -- !x! if(sub_defined(!!argvar!!))
 -- !x!   write "Argument variable !!argvar!! is: !!!!argvar!!!!"
 -- !x!   include arg_vars_loop.sql
 -- !x! endif

On line 3 of this script the substitution variable argvar is first evaluated to generate a name for a command-line variable, consuming the inner pair of exclamation points. The resulting variable (which will take on values of “$ARG_1”, “$ARG_2”, etc.) will then be evaluated, yielding the value of the command-line variable assignment.

Example 8: Prompt the User to Choose an Option

This example illustrates how the PROMPT SELECT_SUB metacommand can be used to prompt the user to select among several options. In this example, the options allow the user to choose a format in which to (export and) view a data table or view. For this example, there must be a data table or view in the database named some_data.

drop table if exists formats;
create temporary table formats ( format varchar(4) );
insert into formats (format)
values ('csv'), ('tsv'), ('ods'), ('html'), ('txt'), ('pdf'), ('GUI');

-- !x! sub data_table some_data
-- !x! prompt select_sub formats message "Choose the output format you want."
-- !x! if(equals("!!@format!!", "GUI"))
-- !x!   prompt message "Selected data." display !!data_table!!
-- !x! else
-- !x!   sub outfile outfile_!!$DATETIME_TAG!!.!!@format!!
-- !x!   if(equals("!!@format!!", "pdf"))
-- !x!     sub txtfile outfile_!!$DATETIME_TAG!!.txt
-- !x!     write "# Data Table" to !!txtfile!!
-- !x!     export !!data_table!! append to !!txtfile!! as txt
-- !x!     system_cmd(pandoc !!txtfile!! -o !!outfile!!)
-- !x!   else
-- !x!     export !!data_table!! to !!outfile!! as !!@format!!
-- !x!   endif
-- !x!   system_cmd(xdg-open !!outfile!!)
-- !x! endif

This example also illustrates that, because the text (“txt”) output format of the EXPORT metacommand creates a Markdown-compatible table, this type of text output can be combined with output of WRITE metacommands and converted to Portable Document Format (PDF). This example also illustrates how the SYSTEM_CMD metacommand can be used to immediately open and display a data file that was just exported. (Note that the xdg-open command is available in most Linux desktop environments. In Windows, the start command is equivalent.)

This example also illustrates how substitution variables can be used to parameterize code to support modularization and code re-use. In this example the substitution variable data_table is assigned a value at the beginning of the script. Alternatively, this variable might be assigned different values at different locations in a main script, and the commands in the remainder of this example placed in a second script that is INCLUDEd where appropriate to allow the export and display of several different data tables or views. Example 10 illustrates this usage.

Example 9. Using Command-Line Substitution Variables

This example illustrates how substitution variables that are assigned on the command line using the “-a” option can be used in a script.

This example presumes the existence of a SQLite database named todo.db, and a table in that database named todo with columns named todo and date_due. The following script allows a to-do item to be added to the database by specifying the text of the to-do item and its due date on the command line:

-- !x! if(sub_defined($arg_1))
-- !x!   if(sub_defined($arg_2))
insert into todo (todo, date_due) values ('!!$arg_1!!', '!!$arg_2!!');
-- !x!   else
insert into todo (todo) values ('!!$arg_1!!');
-- !x! endif

This script can be used with a command line like:

execsql.py -tl -a "Share your dog food" -a 2015-11-21 add.sql todo.db

Example 10. Using CANCEL_HALT to Control Looping with Dialogs

This example illustrates the use of the CANCEL_HALT metacommand during user interaction with dialogs. Ordinarily when a user presses the “Cancel” button on a dialog, execsql treats this as an indication that a necessary response was not received, and that further script processing could have adverse consequences—and therefore execsql halts script processing. However, there are certain cases when the “Cancel” button is appropriately used to terminate a user interaction without stopping script processing.

The scripts in this example presents the user with a list of all views in the database, allows the user to select one, and then prompts the user to choose how to see the data. Three scripts are used:

  • view_views.sql: This is the initial script that starts the process. It turns the CANCEL_HALT flag off at the start of the process, and turns it back on again at the end.
  • view_views2.sql: This script is included by view_views.sql, and acts as an inner loop, repeatedly presenting the user with a list of all the views in the database. The “Cancel” button on this dialog is used to terminate the overall process. If the user selects a view, rather than canceling the process, then the choose_view.sql script is INCLUDEd to allow the user to choose how to see the data.
  • choose_view.sql: This script presents the dialog that allows the user to choose how to see the data from the selected view. This is the same script used in Example 8, except that the data_table variable is defined in the view_views2.sql script instead of in choose_view.sql.

view_views.sql

create temporary view all_views as
select table_name from information_schema.views;

-- !x! cancel_halt off
-- !x! include view_views2.sql
-- !x! cancel_halt on

view_views2.sql

-- !x! prompt select_sub all_views message "Select the item you would like to view or export; Cancel to quit."
-- !x! if(sub_defined(@table_name))
-- !x!    sub data_table !!@table_name!!
-- !x!    include choose_view.sql
-- !x!    include view_views2.sql
-- !x! endif

The choose_view.sql script can be seen in Example 8.

The CONTINUE keyword of the PROMPT SELECT_SUB metacommand can also be used to close the dialog without canceling the script.

Example 11. Output Numbering with Counters

This example illustrates how counter variables can be used to automatically number items. This example shows automatic numbering of components of a Markdown document, but the technique can also be used to number database objects such as tables and views.

This example creates a report of the results of a set of QA checks, where the information about the checks is contained in a table with the following columns:

  • check_number: An integer that uniquely identifies each QA check that is conducted.
  • test_description: A narrative description of the scope or operation of the check.
  • comment_text: A narrative description of the results of the check.

The results of each check are also represented by tabular output that is saved in a table named qa_tbl_x where x is the check number.

-- !x! sub section !!$counter_1!!
-- !x! sub check_no !!$counter_3!!
-- !x! write "# Section !!section!!. QA Check !!check_no!!" to !!outfile!!
-- !x! reset counter 2
-- !x! write "## Subsection !!section!!.!!$counter_2!!. Test" to !!outfile!!
create or replace temporary view descript as
select test_description from qa_results where check_number = !!check_no!!;
-- !x! export descript append to !!outfile!! as plain
-- !x! write "## Subsection !!section!!.!!$counter_2!!. Results" to !!outfile!!
create or replace temporary view comment as
select comment_text from qa_results where check_number = !!check_no!!;
-- !x! export comment append to !!outfile!! as plain
-- !x! write "Table !!check_no!!." to !!outfile!!
-- !x! write "" to !!outfile!!
-- !x! export qa_tbl_!!check_no!! append to !!outfile!! as txt-nd
-- !x! write "" to !!outfile!!

A script like this one could be INCLUDEd as many times as there are sets of QA results to report.

This example also illustrates how the value of a counter variable can be preserved for repeated use by assigning it to a user-defined substitution variable.

Example 12. Customize the Table Structure for Data to be Imported

This example illustrates how the structure of a table that would be created by the IMPORT metacommand can be customized during the import process. Customization may be necessary because the data types that are automatically selected for the columns of the new table need to be modified. This may occur when:

  • A column is entirely null. In this case, execsql will create the column with a text data type, whereas a different data type may be more appropriate.
  • A column contains only integers of 1 and 0; execsql will create this column with a Boolean data type, whereas an integer type may be more appropriate.
  • A column contains only integers, whereas a floating-point type may be more appropriate.

The technique shown here first writes the CREATE TABLE statement to a temporary file, and then opens that file in an editor so that you can make changes. After the file is edited and closed, the file is INCLUDEd to create the table structure, and then the data are loaded into that table.

-- !x! sub input_file storm_tracks.csv
-- !x! sub target staging.storm_data

-- !x! sub_tempfile tmpedit
-- !x! write create_table !!target!! from !!input_file!! comment "Modify the structure as needed." to !!tmpedit!!
-- !x! system_cmd(vim !!tmpedit!!)
-- !x! include !!tmpedit!!
-- !x! import to !!target!! from !!input_file!!

Changes to data types that are incompatible with the data to be loaded will result in an error during the import process. Changes to column names will also prevent the data from being imported.

Although this example shows this process applied to only a single file/table, multiple CREATE TABLE statements can be written into a single file and edited all at once.

This example illustrates the use of a temporary file for the CREATE TABLE statement, although you may wish to save the edited form of this statement in a permanent file to keep a record of all data-handling operations.

Example 13. Import All the CSV Files in a Directory

When a group of related data files are to be loaded together into a database, they can all be loaded automatically with this script if they are first placed in the same directory. This example script operates by:

  • Prompting for the directory containing the CSV files to load.
  • Creating a text file with the names of all of the CSV files in that directory.
  • Importing the text file into a database table.
  • Adding columns to that table for the name of the table into which each CSV file is imported and the time of import. The main file name of each CSV file is used as the table name.
  • Looping over the list of CSV files, choosing one that does not have an import time, importing that file, and setting the import time.

This process uses two script files. The first one obtains the list of CSV files, and the second one acts as the inner part of the loop, repeatedly loading a single CSV file. The main script looks like this:

-- !x! prompt directory sub dir
-- !x! sub_tempfile csvlist
-- !x! write "csvfile" to !!csvlist!!
-- !x! system_cmd(sh -c "ls !!dir!!/*.csv >>!!csvlist!!")
-- !x! import to new staging.csvlist from !!csvlist!!

alter table staging.csvlist
    add tablename character varying(64),
    add import_date timestamp with time zone;

update staging.csvlist
set tablename = replace(substring(csvfile from E'[^/]+\.csv$'), '.csv', '');

create temporary view unloaded as
select * from staging.csvlist
where import_date is null
limit 1;

-- !x! if(hasrows(unloaded))
-- !x!     write "Importing CSV files from !!dir!!"
-- !x!     include import1.sql
-- !x!     write "Done."
-- !x! endif

The second script, which must be named import1.sql, in accordance with the reference to it in the first script, looks like this:

-- !x! select_sub unloaded
-- !x! write "    !!@tablename!!"
-- !x! import to new staging.!!@tablename!! from !!@csvfile!!
update staging.csvlist
set import_date = current_timestamp
where tablename = '!!@tablename!!';
-- !x! if(hasrows(unloaded)) { include import1.sql }

This example is designed to run on a Linux system with PostgreSQL, but the technique can be applied in other environments and with other DBMSs.

Example 14. Run a Script from a Library Database

Despite the advantages of storing scripts on the file system, in some cases storing a set of scripts in a library database may be appropriate. Consider a table named scriptlib that is used to store SQL scripts, and that has the following columns:

  • script_name: A name used as a unique identifier for each script; the primary key of the table.
  • script_group: A name used to associate related scripts.
  • group_master: A Boolean used to flag the first script in a group that is to be executed.
  • script_text: The text of the SQL script.

A single script from such a library database can be run using another script like the following:

-- !x! sub_tempfile scriptfile
create temporary view to_run as
select script_text from scriptlib
where script_name = 'check_sdg';
-- !x! export to_run to !!scriptfile!! as plain
-- !x! include !!scriptfile!!

This technique could be combined with a prompt for the script to run, using the method illustrated in Example 8, to create a tool that allows interactive selection and execution of SQL scripts.

This technique can be extended to export all scripts with the same script_group value, and then to run the master script for that group. To use this approach, the filename used with the IMPORT metacommand in each script must be a substitution variable that is to be replaced with the name of a temporary file created with the SUB_TEMPFILE metacommand.

Example 15: Prompt for Multiple Values

The PROMPT SELECT_SUB metacommand allows the selection of only one row of data at a time. Multiple selections can be obtained, however, by using the PROMPT SELECT_SUB metacommand in a loop and accumulating the results in another variable or variables.

This example illustrates that process, using a main script that INCLUDEs another script, choose2.sql, to present the prompt and accumulate the choices in the desired form.

The main script looks like this:

-- !x! sub prompt_msg Choose a sample material, or Cancel to quit.
create or replace temporary view vw_materials as
select sample_material, description from e_sampmaterial;
-- !x! cancel_halt off
-- !x! include choose2.sql
-- !x! cancel_halt on
-- !x! if(sub_defined(in_list))
           create or replace temporary view sel_samps as
           select * from d_sampmain
           where sample_material in (!!in_list!!);
-- !x!     prompt message "Selected samples." display sel_samps
-- !x! endif

The script that is included, choose2.sql, looks like this:

-- !x! prompt select_sub vw_materials message "!!prompt_msg!!"
-- !x! if(sub_defined(@sample_material))
-- !x!     if(not sub_defined(in_list))
-- !x!         sub in_list '!!@sample_material!!'
-- !x!         sub msg_list !!@sample_material!!
-- !x!     else
-- !x!         sub in_list !!in_list!!, '!!@sample_material!!'
-- !x!         sub msg_list !!msg_list!!, !!@sample_material!!
-- !x!     endif
           create or replace temporary view vw_materials as
           select sample_material, description from e_sampmaterial
           where sample_material not in (!!in_list!!);
-- !x!     sub prompt_msg You have chosen !!msg_list!!; choose another, or Cancel to quit.
-- !x!     include choose2.sql
-- !x! endif

In this example, only one value from each of the multiple selections is accumulated into a single string in the form of a list of SQL character values suitable for use with the SQL in operator. The multiple values could also be accumulated in the form of a values list, if appropriate to the intended use.

Another approach to handling multiple selection is to reassign each selected value to another substitution variable that has a name that is dynamically created using a counter variable, as shown in the following snippet.

-- !x!     sub selections !!$counter_1!!
-- !x!     sub sample_material_!!selections!! !!@sample_material!!
-- !x!     sub description_!!selections!! !!@description!!

Example 16: Evaluating Complex Expressions with Substitution Variables

Although execsql does not itself process mathematical expressions or other similar operations on substitution variables, all of the functions of SQL and the underlying DBMS can be used to evaluate complex expressions that use substitution variables. For example:

-- !x! sub var1 56
-- !x! sub var2 October 23
create temporary view var_sum as
select cast(right('!!var2!!', 2) as integer) + !!var1!! as x;
-- !x! subdata sum var_sum

This will assign the result of the expression to the substitution variable “sum”. Any mathematical, string, date, or other functions supported by the DBMS in use can be applied to substitution variables in this way.

Example 17: Displaying Summary and Detailed Information

A set of QA checks performed on data may be summarized as a list of all of the checks that failed; however, there may also be detailed information about those results that the user would like to see—such as a list of all the data rows that failed. Assuming that a view has been created for each QA check, and that the QA check failures have been compiled into a table of this form (see also Example 3):

create table qa_results (
    description character varying(80),
    detail_view character varying(24)
    );

The detail_view column should contain the name of the view with detailed information about the QA failure. Both the summary and detailed information can be presented to the viewer using the following statements in the main script:

-- !x! if(hasrows(qa_results))
    -- !x! sub prompt_msg Select an item to view details, Cancel to quit data loading.
    -- !x! include qa_detail.sql
-- !x! endif

Where the qa_detail.sql script is as follows:

-- !x! prompt select_sub qa_results message "!!prompt_msg!!" continue
-- !x! if(sub_defined(@detail_view))
    -- !x! prompt message "!!@description!!" display !!detail_view!!
    -- !x! include qa_detail.sql
-- !x! endif

The user can cancel further script processing using the “Cancel” button on either the summary dialog box or any of the detail displays. If the “Continue” button is chosen on the summary dialog box, script processing will resume.

Example 18: Creating a Simple Entry Form

This example illustrates the creation of a simple data entry form using the PROMPT ENTRY_FORM metacommand. In this example, the form is used to get a numeric value and a recognized set of units for that value, and then display that value converted to all compatible units in the database.

This example relies on the presence of a unit dictionary in the database (e_unit) that contains the unit code, the dimenension of that unit, and a conversion factor to convert values to a standard unit for the dimension. This example uses two scripts, named unit_conv.sql and unit_conv2.sql, the first of which INCLUDEs the second.

unit_conv.sql

create temporary table inp_spec (
    sub_var text,
    prompt text,
    required boolean,
    initial_value text,
    width integer,
    lookup_table text,
    validation_regex text,
    validation_key_regex text,
    sequence integer
    );

insert into inp_spec
    (sub_var, prompt, required, width, lookup_table, validation_regex, validation_key_regex, sequence)
values
    ('value', 'Numeric value', True, 12, null, '-?[0-9]+\\.?[0-9]*', '-?[0-9]?\\.?[0-9]*', 1),
    ('unit', 'Unit', True, null, 'e_unit', null, null, 2),
    ('comment', 'Comment', False, 40, null, null, null, 3)
    ;

create or replace temporary view entries as
select * from inp_spec order by sequence;

-- !x! prompt entry_form entries message "Enter a value, unit, and comment."

-- !x! include unit_conv2.sql

Note that to include a decimal point in the regular expressions for the numeric value, the decimal point must be escaped twice: once for SQL, and once for the regular expression itself. Also note that in this case, the validation_regex and the validation_key_regex are identical except that all subexpressions in the latter are optional. If the first digit character class were not optional, then at least one digit would always be required, and entry of a leading negative sign would not be possible (though a negative sign could be added after at least one digit was entered).

unit_conv2.sql

create or replace temporary view conversions as
select
    !!value!! * su.factor / du.factor as converted_value,
    du.unit
from
    e_unit as su
    inner join e_unit as du on du.dimension=su.dimension and du.unit <> su.unit
where
    su.unit='!!unit!!'
order by
    du.unit;

update inp_spec
set initial_value='!!value!!'
where sub_var = 'value';
-- !x! sub old_value !!value!!

update inp_spec
set initial_value='!!unit!!'
where sub_var = 'unit';
-- !x! sub old_unit !!unit!!

-- !x! if(sub_defined(comment))
update inp_spec
set initial_value='!!comment!!'
where sub_var = 'comment';
-- !x! endif

-- !x! prompt entry_form entries message "The conversions for !!value!! !!unit!! are:"display conversions

-- !x! if(not equal("!!unit!!", "!!old_unit!!"))
-- !x! orif(not equal("!!value!!", "!!old_value!!"))
-- !x! include unit_conv2.sql
-- !x! endif

The unit_conv2.sql script will continue to display conversions for as long as either the value or the unit is changed.

Example 19: Dynamically Altering a Table Structure to Fit Data

Example contributed by E. Shea.

This example illustrates automatic (scripted) revisions to a table structure, wherein a number of additional columns are added to a table; the number of columns added is determined by the data. The case illustrated here is of a Postgres table containing a PostGIS multipoint geometry column. The purpose of this script is to extract the coordinate points from the geometry column and store each point as a pair of columns containing latitude and longitude values. The number of points in the multipoint column varies from row to row, and the maximum number of points across all rows is not known (and need not be known) when this script is run.

This example assumes the existence of a database table named sample_points that contains the following two columns:

  • sample: a text value uniquely identifying each row.
  • locations: a multi-point PostGIS geometry value.

This operation is carried out using two scripts, named expand_geom.sql and expand_geom2.sql. The first of these calls the second. Looping and a counter variable are used to create and fill as many additional columns as are needed to contain all the point coordinates.

expand_geom.sql

create temporary view vw_pointcount as
select max(ST_NumGeometries(locations)) as point_len
from sample_points;

--!x! subdata point_len vw_pointcount

drop table if exists tt_samp_loc_points;
create temporary table tt_samp_loc_points (
    sample text,
    locations geometry,
    location_srid integer,
    point_count integer
    );

insert into tt_samp_loc_points
    (sample, locations, location_srid, point_count)
select
    sample, locations, ST_SRID(locations), ST_NumGeometries(locations)
from sample_points;

--!x! include expand_geom2.sql

expand_geom2.sql

--!x! sub point_number !!$counter_1!!

alter table tt_samp_loc_points
    add column point_!!point_number!!_longitude double precision,
    add column point_!!point_number!!_latitude double precision;

update tt_samp_loc_points
set
    point_!!point_number!!_longitude = ST_X(ST_GeometryN(locations, !!point_number!!)),
    point_!!point_number!!_latitude = ST_Y(ST_GeometryN(locations, !!point_number!!))
where
    point_count>=!!point_number!!;

--!x! if(is_gt(!!point_len!!, !!point_number!!)) {include expand_geom2.sql}

Example 20: Logging Data Quality Issues

This example illustrates how data quality issues that are encountered during data loading or cleaning operations can be logged for later evaluation and resolution. Issues are logged in a SQLite database in the working directory. This database is named issue_log.sqlite and is automatically created if necessary. The database contains one table named issue_log in which all issues are recorded. The issue log database may also contain additional tables that provide data to illustrate the issues. Each of these additional tables has a name that starts with “issue_”, followed by an automatically-assigned issue number.

The script that logs the issues is named log_issue.sql. It should be included in the main script at every point where an issue is to be logged. Three substitution variables are used to pass information to this script:

  • dataset: The name of the data set to which this issue applies.
  • issue: A description of the issue.
  • issue_data: The name of a table or view containing a data summary that illustrates the issue. This substitution variable need not be defined if no illustrative data are necessary or applicable (use the RM_SUB metacommand to un-define this variable if it has been previously used).

Each issue is logged only once. The issue_log table is created with additional columns that may be used to record the resolution of each issue, and these are not overwritten if an issue is encountered repeatedly.

The log\_issue.sql script uses several substitution variables with names that start with “iss_”, and uses counter variable 15503. Other parts of the loading and cleaning scripts should avoid collisions with these values.

log_issue.sql

-- ------------------------------------------------------------------
-- Save the current alias to allow switching back from the SQLite issue log.
-- ------------------------------------------------------------------
-- !x! sub iss_calleralias !!$CURRENT_ALIAS!!

-- ------------------------------------------------------------------
-- Connect to the issue log.  Create it if it doesn't exist.
-- ------------------------------------------------------------------
-- !x! if(alias_defined(iss_log))
-- !x!      use iss_log
-- !x! else
-- !x!      if(file_exists(issue_log.sqlite))
-- !x!              connect to sqlite(file=issue_log.sqlite) as iss_log
-- !x!              use iss_log
        create temporary view if not exists iss_no_max as
        select max(issue_no) from issue_log;
-- !x!              subdata iss_max iss_no_max
-- !x!              set counter 15503 to !!iss_max!!
-- !x!      else
-- !x!              connect to sqlite(file=issue_log.sqlite, new) as iss_log
-- !x!              use iss_log
        create table issue_log (
            issue_no integer,
            data_set text,
            issue_summary text,
            issue_details text,
            resolution_strategy text,
            resolution_reviewer text,
            reviewer_comments text,
            final_resolution text,
            resolution_implemented date,
            resolution_script text
            );
-- !x!      endif
-- !x! endif

-- ------------------------------------------------------------------
-- Add the new issue if it does not already exist.
-- ------------------------------------------------------------------
drop view if exists check_issue;
create temporary view check_issue as
    select issue_no
    from issue_log
    where data_set='!!dataset!!'
        and issue_summary='!!issue!!';

-- !x! if(not hasrows(check_issue))
-- !x!      sub iss_no !!$counter_15503!!
    insert into issue_log
        (issue_no, data_set, issue_summary)
    values (
        !!iss_no!!,
        '!!dataset!!',
        '!!issue!!'
        );
-- !x! else
-- !x!      subdata iss_no check_issue
-- !x! endif

-- ------------------------------------------------------------------
-- Save the issue data if there is any and it has not already
-- been saved.
-- ------------------------------------------------------------------
-- !x! if(sub_defined(issue_data))
-- !x!      sub iss_table issue_!!iss_no!!
-- !x!      if(not table_exists(!!iss_table!!))
-- !x!              copy !!issue_data!! from !!iss_calleralias!! to replacement !!iss_table!! in iss_log
-- !x!      endif
-- !x! endif

-- ------------------------------------------------------------------
-- Restore the caller alias
-- ------------------------------------------------------------------
-- !x! use !!iss_calleralias!!

This script would be used during a data loading or cleaing process as illustrated in the following code snippet.

-- ------------------------------------------------------------------
-- Define the data set both for import and for issue logging.
-- ------------------------------------------------------------------
-- !x! sub dataset data_set_524.csv
-- !x! write "Importing data."
-- !x! import to replacement data524 from !!dataset!!

-- ------------------------------------------------------------------
--  Check for multiple stations per sample.
-- ------------------------------------------------------------------
-- !x! write "Checking for multiple stations per sample."
create temporary view samploc as
    select distinct loc_name, sample_name
    from data524;

create temporary view dupsamplocs as
    select sample_name
    from samploc
    group by sample_name
    having count(*) > 1;

create or replace temporary view allsampdups as
    select samploc.sample_name, loc_name
    from samploc
        inner join dupsamplocs as d on d.sample_name=samploc.sample_name
    order by samploc.sample_name;

-- !x! if(hasrows(dupsamplocs))
-- !x!      sub issue Multiple locations for a sample.
-- !x!      sub issue_data allsampdups
-- !x!      include log_issue.sql
-- !x! endif

Example 21: Updating Multiple Databases with a Cross-Database Transaction

This example illustrates how the same SQL script can be applied to multiple databases, and the changes committed only if they were successful for all databases. This makes use of the looping technique illustrated in Example 6, but using sub-scripts defined with BEGIN/END SCRIPT metacommands instead of INCLUDE metacommands. The same approach, of committing changes only if there were no errors in any database, could also be done without looping, simply by unrolling the loops to apply the updates to each database in turn. This latter approach would be necessary when different changes were to be made to each database—though even in that case, the commit statements could all be executed in a loop.

-- =========================================================
--          Setup
-- Put database names in a table so that we can loop through
-- the list and process the databases one by one.
-- ---------------------------------------------------------
create temporary table dblist (
    database text,
    db_alias text,
    updated boolean default false
    );

insert into dblist
    (database)
values
    ('db1'), ('db2'), ('db3');

update dblist
set db_alias = 'conn_' || database;

create temporary view unupdated as
select *
from dblist
where not updated
limit 1;
-- =========================================================


-- =========================================================
--        Script: Update a single database
-- Data variables @database and @db_alias should be defined.
-- ---------------------------------------------------------
-- !x! BEGIN SCRIPT update_database
-- !x! connect to postgresql(server=myserver, db=!!@database!!, user=!!$db_user!!, need_pwd=True) as !!@db_alias!!
-- !x! use !!@db_alias!!

-- Begin a transaction
begin;

<SQL commands to carry out the update>

-- !x! use initial
-- !x! END SCRIPT
-- =========================================================


-- =========================================================
--        Script: Update all databases
-- ---------------------------------------------------------
-- !x! BEGIN SCRIPT update_all_dbs
-- !x! select_sub unupdated
-- !x! if(sub_defined(@database))
    -- !x! write "Updating !!@database!!"
    -- !x! execute script update_db
    update dblist set updated=True where database='!!@database!!';
    -- !x! execute script update_all_dbs
-- !x! endif
-- !x! END SCRIPT
-- =========================================================


-- =========================================================
--        Script: Commit changes to all databases
-- ---------------------------------------------------------
-- !x! BEGIN SCRIPT commit_all_dbs
-- !x! select_sub unupdated
-- !x! if(sub_defined(@db_alias))
    -- !x! write "Committing changes to !!@database!!"
    -- !x! use !!@db_alias!!
    commit;
    -- !x! use initial
    update dblist set updated=True where database='!!@database!!';
    -- !x! execute script commit_all_dbs
-- !x! endif
-- !x! END SCRIPT
-- =========================================================


-- =========================================================
--        Update all databases
-- ---------------------------------------------------------
-- !x! autocommit off
-- !x! execute script update_all_dbs

-- If there was an error during updating of any database,
-- the error should halt execsql, and the following code
-- will not run.
update dblist set updated=False;
-- !x! execute script commit_all_dbs
-- =========================================================

Example 22: Exporting with a Template to Create a Wikipedia Table

This example illustrates how the EXPORT metacommand can be used with the Jinja2 template processor to create a simple Wikipedia table. To run this example, Jinja2 must be specified as the template processor to use; the configuration file must contain the following lines:

[output]
template_processor=jinja

The following template file to be used with the EXPORT metacommand will result in output that will render the exported data as a Wikipedia table.

{| class="wikitable"{% for hdr in headers %}
! scope="col" | {{ hdr }}{% endfor %}
|-
{% for row in datatable %}{% for hdr in headers %}{% for key, value in row.iteritems() %}{% if  key == hdr %}| {{ value }}
{% endif %}{% endfor %}{% endfor %}|-
{% endfor %}

The output produced using this template will look like:

{| class="wikitable"
! scope="col" | row_id
! scope="col" | row_number
! scope="col" | long_text
! scope="col" | some_date
! scope="col" | some_number
|-
| Row 1
| 1
| Lorem ipsum dolor sit amet, consectetuer adipiscing elit.
| 1951-03-18
| 67.7593972309
|-
| Row 2
| 2
| Aenean commodo ligula eget dolor. Aenean massa.
| 1951-03-19
| 26.7195924129
|-
. . .
|}

This template will work with any exported data table.

Example 23: Validation of PROMPT ENTRY_FORM Entries

Example contributed by E. Shea.

This example illustrates how validation of interactively entered data can be performed, and the user prompted repeatedly until the entered data are valid.

Although the PROMPT ENTRY_FORM metacommand allows validation of individual entries, this capability is limited, and cannot be used to cross-validate different entries. The following code demonstrates a prompt for a pair of dates, where validation of both the individual entries and a properly-ordered relationship between the two entries is carried out using a SCRIPT that re-runs itself if the data are not valid.

create temporary table tt_datespecs (
    sub_var text,
    prompt text,
    required boolean default True,
    sequence integer
    );
insert into tt_datespecs (sub_var, prompt, sequence)
values
    ('min_date', 'Start date', 1),
    ('max_date', 'End_date', 2);

-- !x! begin script getdates
   -- !x! if(sub_defined(getdate_error)) { sub getdate_prompt !!getdate_error!! }
   -- !x! sub_append getdate_prompt Enter date range to include:
   -- !x! prompt entry_form tt_datespecs message "!!getdate_prompt!!"

   drop view if exists qa_date1;
   drop view if exists qa_date2;

   -- !x! error_halt off
    create or replace temporary view qa_date1 as
    select
        cast('!!min_date!!' as date) as start_date,
        cast('!!max_date!!' as date) as end_date;
    -- !x! error_halt on
    -- !x! if(sql_error())
        -- !x! sub getdate_error ** ERROR **  One or both of the values you entered is not interpretable as a date.
        -- !x! execute script getdates
    -- !x! else
            create or replace temporary view qa_date2 as
            select 1
            where  cast('!!min_date!!' as date) > cast('!!max_date!!' as date);
            -- !x! if(hasrows(qa_date2))
            -- !x!     sub getdate_error ** ERROR **  The end date must be equal to or later than the start date.
            -- !x!     execute script getdates
            -- !x! endif
    -- !x! endif

    -- !x! rm_sub getdate_error

-- !x! end script