Workflow#
Maestro does try to create an index based on the configuration but it has basic initial mapping, that mapping is not configurable in Runtime, evolving the mapping is the user's responsibility to do through Elasticsearch APIs.
Mapping Changes#
Since Maestro works with a dynamic analysis schema, that can change in runtime the index will need to adapt and Maestro supports that since it can capture and pass along all the new fields added to the analyses in SONG, it tries to stay out of the way as possible. This dynamic model requires a proper migration process to be practiced by the users to allow their Index to evolve along their model, the process will be something like:
- Index is created (either manually or by Maestro)
- Maestro runs and start indexing analysis.
- SONG introduces new analysis types with new fields
- Maestro will continue working and indexing those documents but new fields won't be indexed yet.
- Index mapping needs update due to new analysis types, or change of some structure etc.. :
- Create new index with the updated mapping
- Reindex your data (you can point maestro to your new index and trigger indexing on full repositories to do that,
or you can use
/reindex
API in Elasticsearch). - switch your Aliases if it still points to the old index
- now that your data is migrated Maestro will be indexing based on the new mapping.
How to Index#
Maestro can be used through either a message driven kafka topic or an HTTP json API
Http API#
- Index by Study:
POST http://maestro.host:11235/index/repository/<repo>/study/<studyId>
curl -X POST \
http://localhost:11235/index/repository/collab/study/BASH-AR \
-H 'Content-Type: application/json' \
-H 'cache-control: no-cache' \
-d '{}'
- Index by Analysis:
POST http://maestro.host:11235/index/repository/<repo>/study/<studyId>/analysis/<analysisId>
curl -X POST \
http://localhost:11235/index/repository/collab/study/BASH-AR/analysis/ad7cabf8-df45-40fe6 \
-H 'Content-Type: application/json' \
-H 'cache-control: no-cache'
- Index an entire repository:
POST http://maestro.host:11235/index/repository/<repo-code>
curl -X POST \
http://localhost:11235/index/repository/collab \
-H 'Content-Type: application/json' \
-H 'cache-control: no-cache'
Kafka topics#
Maestro can be configured as mentioned under the running configurations section to listen to kafka topics
spring:
application:
name: maestro
output.ansi.enabled: ALWAYS
cloud:
stream:
# kafka integration with song (remove this key to disable kafka)
kafka:
binder:
brokers: localhost:9092
bindings:
songInput:
consumer:
enableDlq: true
dlqName: maestro_song_analysis_dlq
autoCommitOnError: true
autoCommitOffset: true
input:
consumer:
enableDlq: true
dlqName: maestro_index_requests_dlq
autoCommitOnError: true
autoCommitOffset: true
bindings:
input:
# we don't specify content type because @StreamListener will handle that
destination: maestro_index_requests
group: requestsConsumerGrp
consumer:
maxAttempts: 1
songInput:
destination: song-analysis
group: songConsumerGrp
consumer:
maxAttempts: 1
The maestro_index_requests
topic is for on demand request message instead of using the web api above
the body of the messages should be a JSON, and looks like one of the following:
- Analysis:
{"value" : { "repositoryCode" : "collab", "studyId" : "PEK-AB", "analysisId" : "EGAZ000", "remove": true } }
- Study:
{"value" : { "repositoryCode" : "collab", "studyId" : "PEK-AB" } }
- Full repository (SONG):
{"value" : { "repositoryCode" : "aws" } }
for song-nalysis
topic messages, the message schemas are governed by SONG but they currently look like this:
{"value" : { "analysisId" : "12314124", "studyId" : "PEK-AB", "songServerId": "collab", "state": "PUBLISHED" } }