Skip to content

Walkthrough

This guide is going to guide you through each of the capabilities of the Analytics package. All of the examples assume you have Python 3.5 or greater installed and access to a Jupyter Notebook.

Install

From your terminal or command prompt, simply run:

pip install demyst-analytics

Imports

Once we have our package installed, we need to do is get our Analytics package imported and ensure there are no errors.

from demyst.analytics import Analytics

analytics = Analytics()

If you see something like this:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-eb44c534f221> in <module>()
----> 1 from demyst.analytics import Analytics

ModuleNotFoundError: No module named 'demyst'

Then your Jupyter environment isn't using the Python installation that you ran pip install from.

Load Input File

Let's load a CSV of publicly available business information into a Pandas DataFrame:

Prepare Input File

Now let's use our validate function to see if the Demyst Platform recognizes any of our column types:

At the moment the DataFrame is unusable, but we can address this quickly by mapping the columns to the types Demyst recognizes:

Now we are ready to access some data!

Search Relevant Data Products

Now let's take our prepared input file and feed it to our search function. The first thing you will notice is that Juptyer is going to ask you for a Username and Password.

Warning

If you don't have a set of credentials yet, head on over to the Demyst Console and sign up!

Info

The search function only sends meta information to the Demyst Platform at this stage.

You can now browse through hundreds of data providers that could match against your input file. Some of the providers shown require more input data than the inputs we provided though so let's change our search query to only show data products that can match the exact types we have in our input file by adding the strict=True optional parameter.

Enrich Input File

Now let's choose some data products and enrich our input file:

We've chosen infutor_property_append and owler_search data products to enrich our business dataset. We pass those into the enrich function which takes our input dataframe as a parameter and returns a job ID. The enrichment job is asynchronous, so you will need to store the ID in a variable and use it again in enrich_download.

By default enrich_download will block until all the files are ready, but if you would like to download partial files, you can by passing block_until_complete=False to enrich_download:

Downloading partial results can be helpful when one of the data products you have selected is considerably slower than the others. enrich_status can also be used to help you understand how far along your enrichment job is.

That should get you accessing data! If you run into any problems feel free to reach out to support@demystdata.com!