Since connecting to HANA from Microstrategy ran quite smoothly, I was Interested if it was possible to use some of the more extensive HANA capabilities trough microstrategy.
After sitting in on sessions on the TECH-ED in Amsterdam my interest was piqued by HANA’s text capabilities. I recently worked on a Mobile dashboard where fuzzy search would have been great feature.
First using HANA Studio to create a table having three columns (file_name, file_path, content) with content having BLOB dataype.
The next step was of course to get text data into HANA. As I wanted to limit the scope, I chose not to go the data services route. Instead I found a post mentioning Python being used.
After some problems with the conversion to plain text I found a tool PDF miner.
It can convert the pdf to plain text. So I came up with following script to load all pdf’s in a folder as plain text into the HANA table.import pyodbc import os import subprocess import tkFileDialog path = tkFileDialog.askdirectory(title=’Please select a directory’) os.chdir(path) for files in os.listdir(“.”): if files.endswith(“.pdf”): file_path = os.path.abspath(files) file_name = files output = subprocess.Popen([“PDF2TXT.py”, file_path], shell = True, cwd=r’c:Python27scripts’, stdout=subprocess.PIPE) content = output.stdout.read() cnxn = pyodbc.connect(‘DSN=Hana;UID=USER;PWD=PASSWORD‘) cursor = cnxn.cursor() cursor.execute(“insert into TECH_ED_SESSIONS VALUES (?,?,?)”,(file_name,file_path,content)) cnxn.commit()
In HANA, a normal column is fuzzy searchable, but for the BLOB datatype a full text index needs to be made first.
CREATE FULLTEXT INDEX FTI_TECH_ED_SESSIONS ON “SWERY”.”TECH_ED_SESSIONS”(“CONTENT”) FUZZY SEARCH INDEX ON
After which the fuzzy search on the uploaded plaintext content, ran without problems. Returning the field I just loaded even though there is no HANO in the pdf but there is a frequent mention of HANA.
Select FILE_NAME,PATH from “SWERY”.”TECH_ED_SESSIONS” where CONTAINS(CONTENT,’HANO‘, FUZZY(0.8))
Using the sql statement for a freeform report in Microstrategy. The microstrategy report behaved exactly as expected. With a value prompt being used to enter the text being searched for.
N.B. with thanks to Scott Wery
This article belongs to
- SAP HANA
- Just Blogger