QueryAnalysis

Getting Started

Prerequisites

You need to have Maven, OpenJDK 8 and Python 3 installed.

Installing

$ mvn clean package

Running the main Java log analyser

(Note that you probably don't want to do this and should just run the Python harness described below, because it does some necessary work to set arguments and create directories.)

# Processes the example SPARQL log files into exampleMonthsFolder/exampleMonth/processedLogData
$ mvn exec:java@QueryAnalysis -Dexec.args="-w exampleMonthsFolder/exampleMonth -logging"

# There are more (useful) CLI parameter available, you can list them with:
$ mvn exec:java@QueryAnalysis -Dexec.args="--help"

Important: In order to not flush the command line with error messages all uncaught Runtime Exceptions are being written to the log files, residing in the logs/ folder, so please have a look at those regularly. The logs are not generated by default, so you should enable them using the -l option.

Running the QueryAnalysis script

The QueryAnalysis script handles both steps: extraction using hive and processing using the java application. Extraction using hive only works on the server, but is ignored if the month exists in the months folder. To run the QueryAnalysis script locally, you need to provide the local months folder.

# The -l option enables logging
$ python3 -m tools.QueryAnalysis exampleMonth -m ../exampleMonthsFolder -l

# You can also specify multiple months in the same directory by separating them using commas
$ python3 -m tools.QueryAnalysis exampleMonth,otherMonth -m ../exampleMonthsFolder -l

You will also need to update tools/config.py with paths in your own directory.

Running the Anonymization script

After you've extracted the raw query data (done as the first step in the QueryAnalysis script above), you can anonymize the extracted queries for the specified month(s).

$ python3 -m tools.Anonymize exampleMonth -m exampleMonthsFolder/ -l

Caveat emptor

This code won't work in the current form on the server. The hive call in tools/QueryAnalysis.py needs to be updated with the current location of the relevant data. See comments in that file.

Depending on what is ultimately extracted from logs, downstream changes in the Java code will probably be necessary. For example, InputHandlerTSV.java currently expects a URL-encoded query, but as of March 2026 the logs are not stored in this form.

Additionally, it will be important to verify that the call to StandardizingSPARQLParser.anonymize() in OutputHandlerAnonymizer.java correctly anonymizes the input queries.

License

The code in this repository is released under the Apache 2.0 license. External libraries used may have their own licensing terms.

Name		Name	Last commit message	Last commit date
Latest commit History 983 Commits
anonymization		anonymization
exampleMonthsFolder/exampleMonth		exampleMonthsFolder/exampleMonth
parserSettings		parserSettings
preBuildQueryTypeFiles		preBuildQueryTypeFiles
propertyClassification		propertyClassification
src/main/java		src/main/java
tools		tools
userAgentClassification		userAgentClassification
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
geosoft_checks.xml		geosoft_checks.xml
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QueryAnalysis

Getting Started

Prerequisites

Installing

Running the main Java log analyser

Running the QueryAnalysis script

Running the Anonymization script

Caveat emptor

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

QueryAnalysis

Getting Started

Prerequisites

Installing

Running the main Java log analyser

Running the QueryAnalysis script

Running the Anonymization script

Caveat emptor

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages