This guide is going to guide you through each of the capabilities of the Analytics package. All of the examples assume you have Python 3.5 or greater installed and access to a Jupyter Notebook.
From your terminal or command prompt, simply run:
pip install demyst-analytics
Once we have our package installed, we need to do is get our Analytics package imported and ensure there are no errors.
from demyst.analytics import Analytics analytics = Analytics()
If you see something like this:
--------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) <ipython-input-1-eb44c534f221> in <module>() ----> 1 from demyst.analytics import Analytics ModuleNotFoundError: No module named 'demyst'
Then your Jupyter environment isn't using the Python installation that you ran
pip install from.
Load Input File
Let's load a CSV of publicly available business information into a Pandas DataFrame:
Prepare Input File
Now let's use our validate function to see if the Demyst Platform recognizes any of our column types:
At the moment the DataFrame is unusable, but we can address this quickly by mapping the columns to the types Demyst recognizes:
Now we are ready to access some data!
Search Relevant Data Products
Now let's take our prepared input file and feed it to our search function. The first thing you will notice is that Juptyer is going to ask you for a Username and Password.
If you don't have a set of credentials yet, head on over to the Demyst Console and sign up!
The search function only sends meta information to the Demyst Platform at this stage.
You can now browse through hundreds of data providers that could match against
your input file. Some of the providers shown require more input data than the inputs
we provided though so let's change our search query to only show data products
that can match the exact types we have in our input file by adding the
Enrich Input File
Now let's choose some data products and enrich our input file:
owler_search data products to enrich our
business dataset. We pass those into the enrich function which takes our input dataframe
as a parameter and returns a job ID. The enrichment job is asynchronous, so you will need
to store the ID in a variable and use it again in enrich_download.
By default enrich_download will block until
all the files are ready, but if you would like to download partial
files, you can by passing
Downloading partial results can be helpful when one of the data products you have selected is considerably slower than the others. enrich_status can also be used to help you understand how far along your enrichment job is.
That should get you accessing data! If you run into any problems feel free to reach out to email@example.com!