Gen AI Analytics (G-Mode)

GenHealth.ai provides an analytics interface to customers running our model for population level data. More about this product is here https://genhealth.ai/product/gen-ai-population-health-analytics

Data Structure

Data including both patient histories and predicted futures stored in a database at GenHealth.ai can be queried using natural language text or SQL. At minimum, your account will have the stacked_lff table.

  1. ghid (int) - A unique identifier assigned by GenHealth.ai to each patient, serving as a key reference for integrating and managing patient-specific data across different datasets and systems.
  2. prediction_id (int) - An identifier that links a specific prediction or analysis instance to the patient. This enables tracking and referencing predictions over time, facilitating longitudinal studies and outcome monitoring.
  3. date (date) - The date associated with the data entry, which could either reflect the date when a particular health event occurred (for actual context) or when a predicted event is expected to happen (for predicted future).
  4. token_type (string) - Describes the category or nature of the token in the following column. This classification aids in understanding the context or significance of the token, such as identifying it as a symptom, diagnosis, intervention, or any other event or entity relevant to the patient's health journey.
    1. SEQ_START: Indicates the beginning of a patient's history sequence.
    2. SEQ_END: Marks the end of a patient's history sequence.
    3. RACE: Accompanied by a token indicating the patient’s race or ethnicity.
    4. NOT_ON_ADMISSION: Indicates a diagnosis not present on admission.
    5. UNCLEAR_IF_ON_ADMISSION: Unclear if the diagnosis was present on admission.
    6. AGE: Accompanied by a token indicating the patient's age.
    7. DRUG_TYPE_KEY: Indicates that the subsequent events are drug types.
    8. HCPCS_MODIFIER: Accompanied by a token indicating HCPCS modifier codes.
    9. PLACE_OF_SERVICE: Specified by various tokens indicating the location of service.
    10. COST: Indicates the total cost associated with care, in dollar amounts.
    11. DRUG_TEXT: Describes drugs or medications the patient is taking, including dosage.
    12. TIME_GAP: Indicates the number of days between events.
    13. ENROLLMENT_START: Marks the beginning of an enrollment period.
    14. CPT_HCPCS_CODE: Accompanied by a token indicating the CPT or HCPCS code.
    15. PLAN_TYPE: Indicates the type of health plan the patient is enrolled in.
    16. ICD_10_CM: Accompanied by a token indicating the ICD-10-CM condition or diagnosis code.
    17. ICD10PCS: Indicates the ICD-10-PCS procedure code.
    18. ENROLLMENT_END: Marks the end of an enrollment period.
    19. DECEASED: Indicates that the patient has died.
    20. NEW_YEAR: Marks the beginning of a new year.
    21. GENDER: Accompanied by a token indicating the patient's gender.
    22. TOTAL_CLAIM_COST_KEY: Followed by tokens indicating total claim costs.
    23. YEAR: Accompanied by a token indicating the calendar year.
    24. ZIP: Accompanied by a token indicating the five digit zip code
  5. token (string) - This field contains the specific token related to the patient's health record. Depending on the token_type, it might represent a specific symptom, diagnosis, medication, or another relevant piece of health information. The presence of a token is conditional; it may be included to specify details following the token_type, or it may be omitted if the token_type alone sufficiently represents an event or condition.
  6. seq_type (string) - Indicates the temporal context of the row with two possible values: 'actual_context' or 'predicted_future'. 'Actual_context' signifies that the data pertains to the patient's historical health information or events that have already occurred. In contrast, 'predicted_future' denotes data or events that are anticipated to happen based on predictive modeling or analysis. This distinction helps in separating recorded health events from projected health outcomes, enabling a comprehensive understanding of the patient's health trajectory from past through present to future predictions.
  7. seq_idx (integer) - Indicates the ordering of the event within this sequence or prediction_id

Base sequence data

The raw healthcare data above gets transformed into sequential tokens from GenHealth's internal processes. Those raw tokens get stored as historical data and context in columnar databases at GenHealth. This is also the ingest format of GenHealth when accepting new and custom patient history data from customers. Extracted data includes medical codes, terminologies, basic demographics, and a unique GenHealth ID for each individual. Here's an example of the sequence data

┌───────┬───────────────┬─────────────────────┬───────────────────────┬─────────────────────────────────┬────────────────┬─────────┐
│ ghid  ┆ prediction_id ┆ date                ┆ token_type            ┆ token                           ┆ seq_type       ┆ seq_idx │
│ ---   ┆ ---           ┆ ---                 ┆ ---                   ┆ ---                             ┆ ---            ┆ ---     │
│ i32   ┆ i32           ┆ datetime[μs]        ┆ str                   ┆ str                             ┆ str            ┆ i64     │
╞═══════╪═══════════════╪═════════════════════╪═══════════════════════╪═════════════════════════════════╪════════════════╪═════════╡
│ 11535 ┆ 0             ┆ 2018-07-17 00:00:00 ┆ TIME_GAP              ┆ GAP_2                           ┆ actual_context ┆ 0       │
│ 11535 ┆ 0             ┆ 2018-07-19 00:00:00 ┆ YEAR                  ┆ 2019                            ┆ actual_context ┆ 1       │
│ 11535 ┆ 0             ┆ 2018-07-19 00:00:00 ┆ ENROLLMENT_END        ┆                                 ┆ actual_context ┆ 2       │
│ 11535 ┆ 0             ┆ 2018-07-19 00:00:00 ┆ TIME_GAP              ┆ GAP_2                           ┆ actual_context ┆ 3       │
│ 11535 ┆ 0             ┆ 2018-07-21 00:00:00 ┆ COST                  ┆ 31.00                           ┆ actual_context ┆ 4       │
│ 11535 ┆ 0             ┆ 2018-07-21 00:00:00 ┆ TOTAL_CLAIM_COST_KEY  ┆ CLAIM_COST                      ┆ actual_context ┆ 5       │
│ 11535 ┆ 0             ┆ 2018-07-21 00:00:00 ┆ COST                  ┆ 32.00                           ┆ actual_context ┆ 6       │
│ 11535 ┆ 0             ┆ 2018-07-21 00:00:00 ┆ TOTAL_CLAIM_COST_KEY  ┆ CLAIM_COST                      ┆ actual_context ┆ 7       │
│ 11535 ┆ 0             ┆ 2018-07-21 00:00:00 ┆ COST                  ┆ 88.00                           ┆ actual_context ┆ 8       │
│ 11535 ┆ 0             ┆ 2018-07-21 00:00:00 ┆ TOTAL_CLAIM_COST_KEY  ┆ CLAIM_COST                      ┆ actual_context ┆ 9       │
│ …     ┆ …             ┆ …                   ┆ …                     ┆ …                               ┆ …              ┆ …       │
│ 11535 ┆ 0             ┆ 2020-07-11 00:00:00 ┆ ICD_10_CM             ┆ E11.65                          ┆ actual_context ┆ 1987    │
│ 11535 ┆ 0             ┆ 2020-07-11 00:00:00 ┆ NOT_ON_ADMISSION      ┆                                 ┆ actual_context ┆ 1988    │
│ 11535 ┆ 0             ┆ 2020-07-11 00:00:00 ┆ PLACE_OF_SERVICE      ┆ OFFICE                          ┆ actual_context ┆ 1989    │
│ 11535 ┆ 0             ┆ 2020-07-11 00:00:00 ┆ CPT_HCPCS_CODE        ┆ G0108                           ┆ actual_context ┆ 1990    │
│ 11535 ┆ 0             ┆ 2020-07-11 00:00:00 ┆ HCPCS_MODIFIER        ┆ Modifier_95                     ┆ actual_context ┆ 1991    │
│ 11535 ┆ 0             ┆ 2020-07-11 00:00:00 ┆ TOTAL_CLAIM_COST_KEY  ┆ CLAIM_COST                      ┆ actual_context ┆ 1992    │
│ 11535 ┆ 0             ┆ 2020-07-11 00:00:00 ┆ COST                  ┆ 40.00                           ┆ actual_context ┆ 1993    │
│ 11535 ┆ 0             ┆ 2020-07-11 00:00:00 ┆ TIME_GAP              ┆ GAP_5                           ┆ actual_context ┆ 1994    │
│ 11535 ┆ 0             ┆ 2020-07-16 00:00:00 ┆ ON_ADMISSION          ┆                                 ┆ actual_context ┆ 1995    │
│ 11535 ┆ 0             ┆ 2020-07-16 00:00:00 ┆ UNCLEAR_IF_ON_ADMISSION┆                                ┆ actual_context ┆ 1996    │
│ 11535 ┆ 0             ┆ 2020-07-16 00:00:00 ┆ ICD_10_CM             ┆ E11.65                          ┆ actual_context ┆ 1997    │
│ 11535 ┆ 0             ┆ 2020-07-16 00:00:00 ┆ NOT_ON_ADMISSION      ┆                                 ┆ actual_context ┆ 1998    │
│ 11535 ┆ 0             ┆ 2020-07-16 00:00:00 ┆ PLACE_OF_SERVICE      ┆ OFFICE                          ┆ actual_context ┆ 1999    │
│ 11535 ┆ 0             ┆ 2020-07-16 00:00:00 ┆ CPT_HCPCS_CODE        ┆ G0108                           ┆ actual_context ┆ 2000    │
│ …     ┆ …             ┆ …                   ┆ …                     ┆ …                               ┆ …              ┆ …       │
│ 11535 ┆ 0             ┆ 2020-08-03 00:00:00 ┆ DRUG_TEXT             ┆ 3 ML insulin detemir 100 UNT/M… ┆ actual_context ┆ 2100    │
│ 11535 ┆ 0             ┆ 2020-08-03 00:00:00 ┆ DRUG_TEXT             ┆ Levemir FlexTouch Subcutaneous… ┆ actual_context ┆ 2116    │
│ 11535 ┆ 0             ┆ 2020-08-03 00:00:00 ┆ TIME_GAP              ┆ GAP_4                           ┆ actual_context ┆ 2128    │
│ 11535 ┆ 0             ┆ 2020-08-07 00:00:00 ┆ DRUG_TEXT             ┆ lisinopril 20 MG Oral Tablet    ┆ actual_context ┆ 2131    │
│ 11535 ┆ 0             ┆ 2020-08-07 00:00:00 ┆ DRUG_TEXT             ┆ Lisinopril Oral                 ┆ actual_context ┆ 2139    │
└───────┴───────────────┴─────────────────────┴───────────────────────┴─────────────────────────────────┴────────────────┴─────────┘

Data Ingest

Healthcare data is inherently diverse, spanning from standardized formats like FHIR and HL7v2 to more unique structures such as internal databases, tables, and even text within PDFs. At GenHealth.ai, we recognize the complexity of this data landscape and have tailored our services to accommodate this variety. Our team has extensive experience working with all these data sources, ensuring that we can support our customers in ingesting their data effectively. In collaboration with our customers, we navigate these complexities, leveraging our expertise to integrate data seamlessly into our platform, regardless of its original format. This approach allows us to provide comprehensive solutions that meet our customers' specific needs, facilitating better healthcare outcomes through advanced AI insights.

All the relevant tokens and codes will be extracted into sequences of events and put into the stacked_lff table described above

Custom identifiable data

In addition to the medical and claims data provided, we have a separate table (the patient table) that contains personally identifiable health information. This table provides a crosswalk between the GenHealth ID and the customer specific ID. In addition, it supports a set of attributes for each patient which include the following:

  1. id (string) - the customer provided patient id
  2. ghid (string) - the GenHealth.ai patient id
  3. name_given (string) - the given name of this individual
  4. name_middle (string) - the middle name of this individual
  5. name_family (string) - the family or last name of this individual
  6. birthdate (date) - the date of birth (this will also be extracted into sequences but can be useful here as well)
  7. address_line (string) - The street address or PO Box number of the patient's home mailing address.
  8. address_city (string) - The city of the patient's address.
  9. address_state (string) - The state, province, or region of the patient's address.
  10. address_postalCode (string) - The postal or ZIP code of the patient's address.
  11. address_country (string) - The country of the patient's address.
  12. phone_home (string) - The patient's home telephone number.
  13. phone_work (string) - The patient's work telephone number.
  14. phone_mobile (string) - The patient's mobile telephone number.
  15. email (string) - The patient's email address, recognizing the modern necessity of email communication.
  16. maritalStatus (string) - Indicates the patient's current marital status, offering context for social determinants of health.
  17. language (string) - The primary language spoken by the patient, essential for personalized communication and cultural competence in healthcare delivery.
  18. ethnicity (string) - Information on the patient's ethnic background, aiding in addressing health disparities and tailoring health interventions. (this will also be extracted into sequences but can be useful here as well)
  19. contact (string) - Information about a person to be contacted in case of emergency, including the relationship to the patient and contact details.

SQL API

The SQL API allows users to query the underlying data via an application of their choice. This API completely bypasses the G-Mode LLM and strictly sends the input SQL text as a database query. See here to get your GenHealth.ai API keys. Currently the SQL API will limit results to 100 rows max.

Query

curl -XPOST 'https://gmode.genhealth.ai/api/sql' \
        -H 'Content-Type: application/json' \
        -H 'Authorization: Bearer gh_s_k-xxx' \
        -d '{"sql": "SELECT * FROM genhealth_seq"}'

Response

{
  "rows": [
    {
      "ghid": 1,
      "prediction_id": 0,
      "date": "2020-02-14",
      "token_type": "TIME_GAP",
      "token": "GAP_13",
      "seq_type": "actual_context"
    },
    {
      "ghid": 1,
      "prediction_id": 0,
      "date": "2020-02-27",
      "token_type": "TOTAL_CLAIM_COST_KEY",
      "token": "CLAIM_COST",
      "seq_type": "actual_context"
    }
    ...
  ],
  "sql": "SELECT * FROM genhealth_seq\nLIMIT 100;"
}