class Study_overview
A class that generates comprehensive study overview reports by querying a Neo4j graph database and producing Excel files with ID mappings, audit logs, and Gantt chart visualizations of study progress.
/tf/active/vicechatdev/resources/documents.py
150 - 223
complex
Purpose
Study_overview is responsible for extracting and visualizing study-related data from a Neo4j database. It retrieves sample IDs, tracks task completion timelines across various laboratory processes (from quote to final report), and generates downloadable reports including Excel spreadsheets and interactive HTML Gantt charts. This class is designed for laboratory information management systems to provide stakeholders with a complete overview of study progress and sample tracking.
Source Code
class Study_overview():
graph = Graph(config.DB_ADDR, auth=config.DB_AUTH, name=config.DB_NAME)
def __init__(self, study):
self.study=study
self.id_table = self.get_ids(study)
self.table_buffer, self.img_buffer = self.get_task_trail(study)
self.files = [(self.id_table,f'{study}_IDs.xlsx'), (self.table_buffer,f'{study}_audit_log.xlsx'), (self.img_buffer,f'{study}_gantt_chart.html')]
def get_ids(self, study):
df=self.graph.run(f"""
MATCH (:Study {{N:'{study}'}})-[*]->(g:Group)-->(e:ExpItem)-->(o:Organ) WHERE NOT g.N = 'Z' AND NOT o.external_N = 'None'
RETURN DISTINCT e.N as CPathID, o.external_N as CustomerID ORDER BY CPathID""").to_data_frame()
buffer = io.BytesIO()
df.to_excel(buffer, index=False)
buffer.seek(0)
return buffer
def get_task_trail(self, study):
df = self.graph.run(f"""
MATCH (s:Study {{N:'{study}'}})
RETURN 'Quote' as Task, Date(s.quote) as Start, count(s) as Completed
UNION ALL
MATCH (s:Study {{N:'{study}'}})
RETURN 'Study Plan' as Task, Date(s.studyplan) as Start, count(s) as Completed
UNION ALL
MATCH (s:Study {{N:'{study}'}})
RETURN 'Delivery' as Task, Date(s.delivered) as Start, count(s) as Completed
UNION ALL
MATCH (s:Study {{N:'{study}'}})
RETURN 'Draft Report' as Task, Date(s.draftreport) as Start, count(s) as Completed
UNION ALL
MATCH (s:Study {{N:'{study}'}})
RETURN 'Final Report' as Task, Date(s.finalreport) as Start, count(s) as Completed
UNION ALL
MATCH (:Study {{N:'{study}'}})-[*]->(o:Organ)
RETURN 'Sample Registration' as Task, Date(o.registered) as Start, count(o) as Completed
UNION ALL
MATCH (:Study {{N:'{study}'}})-[*]->(p:Parblock)
RETURN 'Grossing' as Task, Date(p.created) as Start, count(p) as Completed
UNION ALL
MATCH (:Study {{N:'{study}'}})-[*]->(p:Parblock)
RETURN 'Embedding' as Task, Date(p.embedded) as Start, count(p) as Completed
UNION ALL
MATCH (:Study {{N:'{study}'}})-[*]->(s:Slide)
RETURN 'Sectioning' as Task, Date(s.created) as Start, count(s) as Completed
UNION ALL
MATCH (:Study {{N:'{study}'}})-[*]->(s:Slide)
RETURN 'Staining' as Task, Date(s.stained) as Start, count(s) as Completed
""").to_data_frame()
df=df.dropna()
df.Start=df.Start.apply(lambda x: x.to_native())
df['End'] = df.Start + dt.timedelta(days=1)
fig = px.timeline(df, x_start="Start", x_end="End", y="Task", color="Completed", custom_data=['Completed'],
category_orders={"Task":['Quote','Study Plan','Delivery','Sample Registration',
'Grossing','Embedding','Sectioning','Staining','Assessment',
'Draft Report','Final Report']},
title=f"{study} time table",
color_continuous_scale=[[0, '#F7D06D'], [1, '#3DCAB1']],)
fig.update_layout({
'plot_bgcolor': 'rgba(0, 0, 0, 0)',
})
config = dict({
'displaylogo':False,
'modeBarButtonsToRemove':['zoom','zoomIn','zoomOut','pan','lasso2d','select2d','autoScale','resetScale']
})
fig.update_traces(hovertemplate= "<b>%{y}</b><br> %{x}: %{customdata[0]}")
table_buffer = io.BytesIO()
df.to_excel(table_buffer, index=False)
table_buffer.seek(0)
figure_buffer = io.StringIO()
fig.write_html(figure_buffer, config=config)
figure_buffer.seek(0)
return table_buffer, figure_buffer
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
bases |
- | - |
Parameter Details
study: The study identifier (string) used to query the Neo4j database. This should match the 'N' property of Study nodes in the graph database. It's used to filter all related data including groups, experimental items, organs, parblocks, and slides associated with this specific study.
Return Value
The constructor returns a Study_overview instance with pre-populated buffers containing study data. The instance has three main outputs stored in the 'files' attribute: (1) id_table - BytesIO buffer with Excel file mapping CPathIDs to CustomerIDs, (2) table_buffer - BytesIO buffer with Excel file containing task audit log data, (3) img_buffer - StringIO buffer with HTML Gantt chart visualization. Each method returns specific data: get_ids() returns BytesIO Excel buffer, get_task_trail() returns tuple of (BytesIO table buffer, StringIO HTML buffer).
Class Interface
Methods
__init__(self, study: str) -> None
Purpose: Initializes the Study_overview instance, executes all database queries, and generates all report buffers
Parameters:
study: String identifier for the study to generate reports for, must match Study.N property in Neo4j database
Returns: None - initializes instance with populated attributes
get_ids(self, study: str) -> io.BytesIO
Purpose: Queries the database for all CPathIDs and CustomerIDs associated with the study and returns them as an Excel file buffer
Parameters:
study: String identifier for the study to retrieve IDs for
Returns: BytesIO buffer containing an Excel file with columns 'CPathID' and 'CustomerID', sorted by CPathID
get_task_trail(self, study: str) -> tuple[io.BytesIO, io.StringIO]
Purpose: Queries the database for all task completion dates and counts, generates both an Excel audit log and an interactive Gantt chart visualization
Parameters:
study: String identifier for the study to retrieve task trail for
Returns: Tuple of (table_buffer, figure_buffer) where table_buffer is BytesIO containing Excel file with task data, and figure_buffer is StringIO containing HTML Gantt chart
Attributes
| Name | Type | Description | Scope |
|---|---|---|---|
graph |
Graph | Class-level Neo4j Graph database connection object shared across all instances, initialized with config settings | class |
study |
str | The study identifier passed to the constructor, stored for reference | instance |
id_table |
io.BytesIO | BytesIO buffer containing Excel file with CPathID to CustomerID mappings for the study | instance |
table_buffer |
io.BytesIO | BytesIO buffer containing Excel file with task audit log data including task names, start dates, and completion counts | instance |
img_buffer |
io.StringIO | StringIO buffer containing HTML representation of the interactive Gantt chart visualization | instance |
files |
list[tuple[io.BytesIO | io.StringIO, str]] | List of tuples containing (buffer, filename) pairs for all three generated files: IDs Excel, audit log Excel, and Gantt chart HTML | instance |
Dependencies
neo4j_driverdatetimeiojsonconfigpython-docxpylibdmtxdocxtplPILplotly
Required Imports
from neo4j_driver import *
import datetime as dt
import io
import json
import config
from docx import Document
from docx.enum.text import WD_ALIGN_PARAGRAPH
from docx.shared import RGBColor
from docx.shared import Pt
from docx.shared import Length
from docx.shared import Inches
from docx.shared import Cm
from docx.enum.table import WD_ROW_HEIGHT_RULE
from pylibdmtx.pylibdmtx import decode
from pylibdmtx.pylibdmtx import encode
from docxtpl import DocxTemplate
from docxtpl import InlineImage
from PIL import Image
import plotly.express as px
Usage Example
# Ensure config.py has DB_ADDR, DB_AUTH, DB_NAME defined
# from neo4j_driver import *
# import config
# from Study_overview import Study_overview
# Instantiate the class with a study identifier
study_overview = Study_overview('STUDY-2024-001')
# Access the generated files
id_excel_buffer, id_filename = study_overview.files[0]
audit_log_buffer, audit_filename = study_overview.files[1]
gantt_chart_buffer, gantt_filename = study_overview.files[2]
# Save files to disk
with open(id_filename, 'wb') as f:
f.write(id_excel_buffer.getvalue())
with open(audit_filename, 'wb') as f:
f.write(audit_log_buffer.getvalue())
with open(gantt_filename, 'w') as f:
f.write(gantt_chart_buffer.getvalue())
# Or access individual components
id_table = study_overview.id_table
table_buffer = study_overview.table_buffer
img_buffer = study_overview.img_buffer
Best Practices
- Instantiate the class only when you need to generate reports, as it immediately executes database queries and generates all outputs in the constructor
- Ensure the Neo4j database connection is available before instantiation, as the class-level Graph object is shared across all instances
- The class uses Cypher queries with string interpolation - ensure study parameter is sanitized to prevent injection attacks
- All data is loaded into memory as buffers during initialization, so be mindful of memory usage for large studies
- The class creates three file buffers immediately upon instantiation; access them via the 'files' attribute as tuples of (buffer, filename)
- Buffers are seeked to position 0 after creation, ready for reading or writing to files
- The Gantt chart uses a fixed color scale and task order; modify the category_orders in get_task_trail() to change task ordering
- The class does not handle database connection errors; wrap instantiation in try-except blocks for production use
- The Graph object is a class variable shared across all instances, which may cause issues in multi-threaded environments
- Date fields are converted from Neo4j date types to Python native datetime objects; ensure database dates are properly formatted
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function get_training_overview 60.9% similar
-
class Tasklist 58.8% similar
-
class options 58.7% similar
-
class Total_tasks 56.7% similar
-
function generate_neo4j_schema_report 54.9% similar