A task is child to a project and contains the details of the job to be run. A typical task will run a sql query, format the results and upload them as a file into a SFTP server.
Tasks include three primary pages:
All Tasks This loads a paginated table of all tasks along with a name cloud of task owners (owners come from the project level), and also a name cloud of the associated projects.
My Tasks This loads a paginated table of all tasks along with a name cloud of the associated projects.
Task Details Summary page of all the task details, a few controls, and the run history.
Rescheduled. This option will remove any existing schedules from the Scheduler API and recreate them.
Duplicated. This option will clone the job in a disabled state.
Cancel Retry. If the task has failed and and retry has been scheduled the user can click here to prevent the retries. This option cannot be clicked while task is running.
Following this most task details are listed along with a log, and a listing of historical file outputs.
The history files can be downloaded or resent to the current destination.
If a task has a non-catastrophic failure, meaning that the app is still running and the error has been logged, we have the option to attempt multiple retries. This is especially useful when uploading files into SFTP’s where you cannot control the upload speeds etc, and may be subject to upload errors.
In cases like this it is advisable to allow the task to rerun.
A source organization and database can be selected.
The option “Include Query Headers” will include/exclude column headings from the output.
A query location must be chosen. The query can be loaded from SFTP, FTP, SMB, Gitlab, web url (for example, github + access token), or can be entered directly as “source code”.
Note
Queries will be modified at runtime. A preview of the modifications is available on the Task Details page. For example, using statements are not allowed in ODBC, so tables must be fully qualified, or a db specified in the connection. The app will automatically remove unauthorized statements in SQL queries.
SQL queries will do a row count as the queries run and will log the current row count to the task log.
There is an option to save queries from Gitlab and web urls when the app runs, and if there is ever a connection issue pulling the code, the cached version will be used. The sql is cached before parameters are inserted so parameters will continue to work.
While a python script can be run as an add-on to any data source (through Processing Script), if you only wish to run a python script and not have any other data source you can choose this option.
Data from the source is passed into a local file and can be modified with a python script. Also, if you choose Python Script you can run a python script without input.
Python scripts have access to task and project metadata, including parameters and any connection strings used. They are passed in through environment variables. Access them through os.environ:
import os
print(os.environ)
If there is a data source selected it will be passed into your python script as an argument.
Source files names can be picked up like this:
import sys
input_file = sys.argv[1]
All processing scripts are run in their own environment and dependencies are automatically installed as needed.
The Processing Script can be loaded from a multitude of places. You can load a multi file package and choose what file to run as well.
A processing script can also return a new file name, or file names. The format must be a list of full paths as strings. Here’s an example script that splits the output from a sql job into multiple files.
Assume a sql query of:
select1unionallselect2
It will produce an output with two rows.
Assume a python script of:
import sys
import csv
from pathlib import Path
outPaths =[]
input_file = sys.argv[1]# this is where any new files should be placed
baseDir = Path(input_file).parent
# ...do something here with input...# example write something into several fileswithopen(Path(input_file),'r')as source:
r = csv.reader(source)for x, row inenumerate(r):
currentFile = Path(baseDir /f'test-{x}.txt')
outPaths.append(str(currentFile))withopen(currentFile,'w')as fp:
fp.write(str(row))# print the new paths backprint(outPaths)
This will produce two files (test-0.txt and test-2.txt). These files are what would be backed up or sent to a vendor. They would also be available for download on the file page.
After creating the file, and optionally encrypting the file, a md5 hash of the file is created and save along with the file name, file size, etc, for future file verification.
The output file can be loaded into an existing connection.
There is an option to attempt to replace existing files with the same name, however this option is also set on the SFTP server level and may conflict.
Also, you can prevent an empty file (for sql, an output with 0 rows, excluding headers) from uploading by checking the box to disable empty uploads. This will also prevent an empty file from being zip’d and sent.
Getting a success/failure email from the extract can be useful. You have the option to attach or embed the output file to the email as well. Use with caution if you are querying sensitive data, or have a file that may exceed or email size limit.
It is also possible to send a text message. Phone numbers can be entered as an email address using the email to sms gateway. You can find the gateway address for your provider here.
If you have phone numbers in the address list they will get a plain text message sent by sms, containing only the success or error message you’ve configured in the app. If no message is entered the default message will send.
There is an option to enable the task immediately. If this is checked then the task will be picked up an run next time the project is triggered by the Scheduler API.