Study_overview - Code Extractor

class Study_overview

Maturity: 40

A class that generates comprehensive study overview reports by querying a Neo4j graph database and producing Excel files with ID mappings, audit logs, and Gantt chart visualizations of study progress.

File:
/tf/active/vicechatdev/resources/documents.py

Lines:
150 - 223

Complexity:
complex

Purpose

Study_overview is responsible for extracting and visualizing study-related data from a Neo4j database. It retrieves sample IDs, tracks task completion timelines across various laboratory processes (from quote to final report), and generates downloadable reports including Excel spreadsheets and interactive HTML Gantt charts. This class is designed for laboratory information management systems to provide stakeholders with a complete overview of study progress and sample tracking.

Source Code

class Study_overview():
    graph = Graph(config.DB_ADDR, auth=config.DB_AUTH, name=config.DB_NAME)
    
    def __init__(self, study):
        self.study=study
        self.id_table = self.get_ids(study)
        self.table_buffer, self.img_buffer = self.get_task_trail(study)
        self.files = [(self.id_table,f'{study}_IDs.xlsx'), (self.table_buffer,f'{study}_audit_log.xlsx'), (self.img_buffer,f'{study}_gantt_chart.html')]
    
    def get_ids(self, study):
        df=self.graph.run(f"""
MATCH (:Study {{N:'{study}'}})-[*]->(g:Group)-->(e:ExpItem)-->(o:Organ) WHERE NOT g.N = 'Z' AND NOT o.external_N = 'None'
RETURN DISTINCT e.N as CPathID, o.external_N as CustomerID ORDER BY CPathID""").to_data_frame()
        buffer = io.BytesIO()
        df.to_excel(buffer, index=False)
        buffer.seek(0)
        return buffer
    
    def get_task_trail(self, study):
        df = self.graph.run(f"""
            MATCH (s:Study {{N:'{study}'}})
            RETURN 'Quote' as Task, Date(s.quote) as Start, count(s) as Completed
            UNION ALL
            MATCH (s:Study {{N:'{study}'}})
            RETURN 'Study Plan' as Task, Date(s.studyplan) as Start, count(s) as Completed
            UNION ALL
            MATCH (s:Study {{N:'{study}'}})
            RETURN 'Delivery' as Task, Date(s.delivered) as Start, count(s) as Completed
            UNION ALL
            MATCH (s:Study {{N:'{study}'}})
            RETURN 'Draft Report' as Task, Date(s.draftreport) as Start, count(s) as Completed
            UNION ALL
            MATCH (s:Study {{N:'{study}'}})
            RETURN 'Final Report' as Task, Date(s.finalreport) as Start, count(s) as Completed
            UNION ALL
            MATCH (:Study {{N:'{study}'}})-[*]->(o:Organ)
            RETURN 'Sample Registration' as Task, Date(o.registered) as Start, count(o) as Completed
            UNION ALL
            MATCH (:Study {{N:'{study}'}})-[*]->(p:Parblock)
            RETURN 'Grossing' as Task, Date(p.created) as Start, count(p) as Completed
            UNION ALL
            MATCH (:Study {{N:'{study}'}})-[*]->(p:Parblock)
            RETURN 'Embedding' as Task, Date(p.embedded) as Start, count(p) as Completed
            UNION ALL
            MATCH (:Study {{N:'{study}'}})-[*]->(s:Slide) 
            RETURN 'Sectioning' as Task, Date(s.created) as Start, count(s) as Completed
            UNION ALL
            MATCH (:Study {{N:'{study}'}})-[*]->(s:Slide) 
            RETURN 'Staining' as Task, Date(s.stained) as Start, count(s) as Completed
        """).to_data_frame()
        df=df.dropna()
        df.Start=df.Start.apply(lambda x: x.to_native())
        df['End'] = df.Start + dt.timedelta(days=1)
        fig = px.timeline(df, x_start="Start", x_end="End", y="Task", color="Completed", custom_data=['Completed'],
                         category_orders={"Task":['Quote','Study Plan','Delivery','Sample Registration',
                                                  'Grossing','Embedding','Sectioning','Staining','Assessment',
                                                  'Draft Report','Final Report']},
                         title=f"{study} time table",
                         color_continuous_scale=[[0, '#F7D06D'], [1, '#3DCAB1']],)
        fig.update_layout({
        'plot_bgcolor': 'rgba(0, 0, 0, 0)',
        })
        config = dict({
            'displaylogo':False,
            'modeBarButtonsToRemove':['zoom','zoomIn','zoomOut','pan','lasso2d','select2d','autoScale','resetScale']
        }) 
        fig.update_traces(hovertemplate= "<b>%{y}</b><br> %{x}: %{customdata[0]}")
        table_buffer = io.BytesIO()
        df.to_excel(table_buffer, index=False)
        table_buffer.seek(0)
        figure_buffer = io.StringIO()
        fig.write_html(figure_buffer, config=config)
        figure_buffer.seek(0)
        return table_buffer, figure_buffer

Parameters

Name	Type	Default	Kind
`bases`	-	-

Parameter Details

study: The study identifier (string) used to query the Neo4j database. This should match the 'N' property of Study nodes in the graph database. It's used to filter all related data including groups, experimental items, organs, parblocks, and slides associated with this specific study.

Return Value

The constructor returns a Study_overview instance with pre-populated buffers containing study data. The instance has three main outputs stored in the 'files' attribute: (1) id_table - BytesIO buffer with Excel file mapping CPathIDs to CustomerIDs, (2) table_buffer - BytesIO buffer with Excel file containing task audit log data, (3) img_buffer - StringIO buffer with HTML Gantt chart visualization. Each method returns specific data: get_ids() returns BytesIO Excel buffer, get_task_trail() returns tuple of (BytesIO table buffer, StringIO HTML buffer).

Class Interface

Methods

`init(self, study: str) -> None`

Purpose: Initializes the Study_overview instance, executes all database queries, and generates all report buffers

Parameters:

study: String identifier for the study to generate reports for, must match Study.N property in Neo4j database

Returns: None - initializes instance with populated attributes

`get_ids(self, study: str) -> io.BytesIO`

Purpose: Queries the database for all CPathIDs and CustomerIDs associated with the study and returns them as an Excel file buffer

Parameters:

study: String identifier for the study to retrieve IDs for

Returns: BytesIO buffer containing an Excel file with columns 'CPathID' and 'CustomerID', sorted by CPathID

`get_task_trail(self, study: str) -> tuple[io.BytesIO, io.StringIO]`

Purpose: Queries the database for all task completion dates and counts, generates both an Excel audit log and an interactive Gantt chart visualization

Parameters:

study: String identifier for the study to retrieve task trail for

Returns: Tuple of (table_buffer, figure_buffer) where table_buffer is BytesIO containing Excel file with task data, and figure_buffer is StringIO containing HTML Gantt chart

Attributes

Name	Type	Description	Scope
`graph`	Graph	Class-level Neo4j Graph database connection object shared across all instances, initialized with config settings	class
`study`	str	The study identifier passed to the constructor, stored for reference	instance
`id_table`	io.BytesIO	BytesIO buffer containing Excel file with CPathID to CustomerID mappings for the study	instance
`table_buffer`	io.BytesIO	BytesIO buffer containing Excel file with task audit log data including task names, start dates, and completion counts	instance
`img_buffer`	io.StringIO	StringIO buffer containing HTML representation of the interactive Gantt chart visualization	instance
`files`	list[tuple[io.BytesIO \| io.StringIO, str]]	List of tuples containing (buffer, filename) pairs for all three generated files: IDs Excel, audit log Excel, and Gantt chart HTML	instance

Dependencies

neo4j_driver
datetime
io
json
config
python-docx
pylibdmtx
docxtpl
PIL
plotly

Required Imports

from neo4j_driver import *
import datetime as dt
import io
import json
import config
from docx import Document
from docx.enum.text import WD_ALIGN_PARAGRAPH
from docx.shared import RGBColor
from docx.shared import Pt
from docx.shared import Length
from docx.shared import Inches
from docx.shared import Cm
from docx.enum.table import WD_ROW_HEIGHT_RULE
from pylibdmtx.pylibdmtx import decode
from pylibdmtx.pylibdmtx import encode
from docxtpl import DocxTemplate
from docxtpl import InlineImage
from PIL import Image
import plotly.express as px

Usage Example

# Ensure config.py has DB_ADDR, DB_AUTH, DB_NAME defined
# from neo4j_driver import *
# import config
# from Study_overview import Study_overview

# Instantiate the class with a study identifier
study_overview = Study_overview('STUDY-2024-001')

# Access the generated files
id_excel_buffer, id_filename = study_overview.files[0]
audit_log_buffer, audit_filename = study_overview.files[1]
gantt_chart_buffer, gantt_filename = study_overview.files[2]

# Save files to disk
with open(id_filename, 'wb') as f:
    f.write(id_excel_buffer.getvalue())

with open(audit_filename, 'wb') as f:
    f.write(audit_log_buffer.getvalue())

with open(gantt_filename, 'w') as f:
    f.write(gantt_chart_buffer.getvalue())

# Or access individual components
id_table = study_overview.id_table
table_buffer = study_overview.table_buffer
img_buffer = study_overview.img_buffer

Best Practices

Instantiate the class only when you need to generate reports, as it immediately executes database queries and generates all outputs in the constructor
Ensure the Neo4j database connection is available before instantiation, as the class-level Graph object is shared across all instances
The class uses Cypher queries with string interpolation - ensure study parameter is sanitized to prevent injection attacks
All data is loaded into memory as buffers during initialization, so be mindful of memory usage for large studies
The class creates three file buffers immediately upon instantiation; access them via the 'files' attribute as tuples of (buffer, filename)
Buffers are seeked to position 0 after creation, ready for reading or writing to files
The Gantt chart uses a fixed color scale and task order; modify the category_orders in get_task_trail() to change task ordering
The class does not handle database connection errors; wrap instantiation in try-except blocks for production use
The Graph object is a class variable shared across all instances, which may cause issues in multi-threaded environments
Date fields are converted from Neo4j date types to Python native datetime objects; ensure database dates are properly formatted

Similar Components

AI-powered semantic similarity - components with related functionality:

function get_training_overview 60.9% similar

Retrieves a comprehensive training overview for the admin panel, including training plans, active assignments, and recent completions from a Neo4j graph database.
From: /tf/active/vicechatdev/CDocs/controllers/training_controller.py
class Tasklist 58.8% similar

A class for tracking and managing the status of study tasks through a Neo4j graph database, monitoring progress through a predefined sequence of workflow steps.
From: /tf/active/vicechatdev/resources/taskmanager.py
class options 58.7% similar

A Panel-based UI class for managing slide release visibility in a study management system, allowing users to view and toggle the release status of slides at various hierarchical levels (Study, Group, Animal, Block, Slide).
From: /tf/active/vicechatdev/options.py
class Total_tasks 56.7% similar

A class that retrieves and manages an overview of all current tasks from a Neo4j graph database, organized by task type and filtered by usergroup.
From: /tf/active/vicechatdev/resources/taskmanager.py
function generate_neo4j_schema_report 54.9% similar

Generates a comprehensive schema report of a Neo4j graph database, including node labels, relationships, properties, constraints, indexes, and sample data, outputting multiple file formats (JSON, HTML, Python snippets, Cypher examples).
From: /tf/active/vicechatdev/neo4j_schema_report.py

← Back to Browse

Assistant

Hi! I can help improve this code. Tell me what you'd like to enhance (e.g., "add error handling", "optimize performance", "improve readability", "add type hints").

Code Comparison

Original Code

                            class Study_overview():
    graph = Graph(config.DB_ADDR, auth=config.DB_AUTH, name=config.DB_NAME)
    
    def __init__(self, study):
        self.study=study
        self.id_table = self.get_ids(study)
        self.table_buffer, self.img_buffer = self.get_task_trail(study)
        self.files = [(self.id_table,f'{study}_IDs.xlsx'), (self.table_buffer,f'{study}_audit_log.xlsx'), (self.img_buffer,f'{study}_gantt_chart.html')]
    
    def get_ids(self, study):
        df=self.graph.run(f"""
MATCH (:Study {{N:'{study}'}})-[*]->(g:Group)-->(e:ExpItem)-->(o:Organ) WHERE NOT g.N = 'Z' AND NOT o.external_N = 'None'
RETURN DISTINCT e.N as CPathID, o.external_N as CustomerID ORDER BY CPathID""").to_data_frame()
        buffer = io.BytesIO()
        df.to_excel(buffer, index=False)
        buffer.seek(0)
        return buffer
    
    def get_task_trail(self, study):
        df = self.graph.run(f"""
            MATCH (s:Study {{N:'{study}'}})
            RETURN 'Quote' as Task, Date(s.quote) as Start, count(s) as Completed
            UNION ALL
            MATCH (s:Study {{N:'{study}'}})
            RETURN 'Study Plan' as Task, Date(s.studyplan) as Start, count(s) as Completed
            UNION ALL
            MATCH (s:Study {{N:'{study}'}})
            RETURN 'Delivery' as Task, Date(s.delivered) as Start, count(s) as Completed
            UNION ALL
            MATCH (s:Study {{N:'{study}'}})
            RETURN 'Draft Report' as Task, Date(s.draftreport) as Start, count(s) as Completed
            UNION ALL
            MATCH (s:Study {{N:'{study}'}})
            RETURN 'Final Report' as Task, Date(s.finalreport) as Start, count(s) as Completed
            UNION ALL
            MATCH (:Study {{N:'{study}'}})-[*]->(o:Organ)
            RETURN 'Sample Registration' as Task, Date(o.registered) as Start, count(o) as Completed
            UNION ALL
            MATCH (:Study {{N:'{study}'}})-[*]->(p:Parblock)
            RETURN 'Grossing' as Task, Date(p.created) as Start, count(p) as Completed
            UNION ALL
            MATCH (:Study {{N:'{study}'}})-[*]->(p:Parblock)
            RETURN 'Embedding' as Task, Date(p.embedded) as Start, count(p) as Completed
            UNION ALL
            MATCH (:Study {{N:'{study}'}})-[*]->(s:Slide) 
            RETURN 'Sectioning' as Task, Date(s.created) as Start, count(s) as Completed
            UNION ALL
            MATCH (:Study {{N:'{study}'}})-[*]->(s:Slide) 
            RETURN 'Staining' as Task, Date(s.stained) as Start, count(s) as Completed
        """).to_data_frame()
        df=df.dropna()
        df.Start=df.Start.apply(lambda x: x.to_native())
        df['End'] = df.Start + dt.timedelta(days=1)
        fig = px.timeline(df, x_start="Start", x_end="End", y="Task", color="Completed", custom_data=['Completed'],
                         category_orders={"Task":['Quote','Study Plan','Delivery','Sample Registration',
                                                  'Grossing','Embedding','Sectioning','Staining','Assessment',
                                                  'Draft Report','Final Report']},
                         title=f"{study} time table",
                         color_continuous_scale=[[0, '#F7D06D'], [1, '#3DCAB1']],)
        fig.update_layout({
        'plot_bgcolor': 'rgba(0, 0, 0, 0)',
        })
        config = dict({
            'displaylogo':False,
            'modeBarButtonsToRemove':['zoom','zoomIn','zoomOut','pan','lasso2d','select2d','autoScale','resetScale']
        }) 
        fig.update_traces(hovertemplate= "<b>%{y}</b><br> %{x}: %{customdata[0]}")
        table_buffer = io.BytesIO()
        df.to_excel(table_buffer, index=False)
        table_buffer.seek(0)
        figure_buffer = io.StringIO()
        fig.write_html(figure_buffer, config=config)
        figure_buffer.seek(0)
        return table_buffer, figure_buffer
                        

Improved Code

🔍 Code Extractor

class Study_overview

Purpose

Source Code

Parameters

Parameter Details

Return Value

Class Interface

Methods

`init(self, study: str) -> None`

`get_ids(self, study: str) -> io.BytesIO`

`get_task_trail(self, study: str) -> tuple[io.BytesIO, io.StringIO]`

Attributes

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

function get_training_overview 60.9% similar

class Tasklist 58.8% similar

class options 58.7% similar

class Total_tasks 56.7% similar

function generate_neo4j_schema_report 54.9% similar

class Study_overview

Purpose

Source Code

Parameters

Parameter Details

Return Value

Class Interface

Methods

__init__(self, study: str) -> None

get_ids(self, study: str) -> io.BytesIO

get_task_trail(self, study: str) -> tuple[io.BytesIO, io.StringIO]

Attributes

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

function get_training_overview 60.9% similar

class Tasklist 58.8% similar

class options 58.7% similar

class Total_tasks 56.7% similar

function generate_neo4j_schema_report 54.9% similar

✨ Improve Code: Study_overview

Code Comparison

`init(self, study: str) -> None`

`get_ids(self, study: str) -> io.BytesIO`

`get_task_trail(self, study: str) -> tuple[io.BytesIO, io.StringIO]`