Machine-learning based query construction and pattern identification for hereditary angioedema

HVH Precision Analytics, LLC

2022

Online Patent

Zugriff:

View record in USPTO Patent Grants (Volltext)

A method, computer program product, and system identifying a probability of a medical condition in a patient. The method includes a processor obtaining data set(s) related to a patient population diagnosed with a medical condition and based on a frequency of features in the data set(s), identifying common features and weighting the common features based on frequency of occurrence in the data set(s) to generate mutual information. The processor generates pattern(s) including a portion of the common features to generate a machine learning algorithm(s). The processor compiles a training set of data to use to tune the machine learning algorithm(s). The processor dynamically adjusts common features in the pattern(s) such that the machine learning algorithm(s) can distinguish patient data indicating the medical condition from patient data not indicating the medical condition. The processor applies the machine learning algorithm(s) to data related to the undiagnosed patient, to determine the probability.

Titel:	Machine-learning based query construction and pattern identification for hereditary angioedema
Autor/in / Beteiligte Person:	HVH Precision Analytics, LLC
Link:	View record in USPTO Patent Grants (Volltext)
Veröffentlichung:	2022
Medientyp:	Patent
Sonstiges:	Nachgewiesen in: USPTO Patent Grants Sprachen: English Patent Number: 11270,797 Publication Date: March 08, 2022 Appl. No: 15/724480 Application Filed: October 04, 2017 Assignees: HVH Precision Analytics LLC (King of Prussia, PA, US) Claim: 1. A computer-implemented method, comprising: obtaining, by one or more processors in a distributed computing environment, one or more machine-readable data sets related to a patient population from one or more databases; identifying, by the one or more processors, based on an initial patient definition, a portion of data from the machine-readable data sets related to the patient population, wherein the portion of the data comprises patients of the patient population with an orphan disease, wherein the orphan disease is Hereditary Angioedema, and wherein the one feature category is selected from the group consisting of: diagnosis codes, procedures, therapies, providers, and locations; based on a frequency of features in the portion of the data, identifying, by the one or more processors, common features in the portion of the data and weighting the common features based on frequency of occurrence in the portion of the data, wherein the common features comprise mutual information, wherein the mutual information comprises features from a plurality of feature categories and wherein each pattern of the one or more patterns comprising a portion of the common features comprises features in one feature category of the plurality of feature categories; utilizing, by the one or more processors, the mutual information to update the initial patient definition, to generate an enhanced patient definition, wherein generating the enhanced patient definition comprises: identifying, by the one or more processors, one or more features of the common features with mutual information values above a predefined threshold; and selecting, by the one or more processors, a portion of the common features, wherein the portion of the common features comprises a smallest subset of features from the one or more features that collectively contain a majority of the mutual information, wherein the portion of the common features comprises the enhanced patient definition, and wherein the portion of the common features comprises a smallest number of the common features that is a largest number of differentiating characteristics of the patient population diagnosed with the orphan disease; generating, by the one or more processors, one or more patterns comprising the portion of the common features; generating, by the one or more processors, one or more machine learning algorithms based on the one or more patterns, the one or more machine learning algorithms to identify presence or absence of the given orphan disease in an undiagnosed patient based on absence or presence of features comprising the one or more patterns in data related to the undiagnosed patient; utilizing, by the one or more processors, statistical sampling to compile a training set of data, wherein the training set comprises data from the one or more data sets and at least one additional data set comprising data related to a population without the orphan disease, and wherein utilizing the statistical sampling comprises formulating and obtaining queries based on the data set and processing and responding to the queries, the processing comprising, for each query: evaluating, by the one or more processors, the query to determine if a prospective response to the query is a single value pulled from a single data set; based on determining that the prospective response to the query is the single value pulled from the single data set, assigning, by the one or more processors, the query to a given computing resource in the distributed computing environment; and based on determining that the prospective response to the query, distributing, by the one or more processors, the query over a group of computing resources of the distributed computing environment to maximize efficiency, wherein the distributing comprises assigning each computing resource of the group of computing resources a portion of the query to execute in parallel with at least one other computing resource of the group of computing resources executing another portion of the query; tuning, by the one or more processors, the one or more machine learning algorithms by applying the one or more machine learning algorithms to the training set of data; dynamically adjusting, by the one or more processors, the common features comprising the one or more patterns to improve accuracy such that the one or more machine learning algorithms can distinguish patient data indicating the orphan disease from patient data that does not indicate the orphan disease; and determining, by the one or more processors, based on applying the one or more machine learning algorithms to data related to the undiagnosed patient, a probability, wherein the probability is a numerical value indicating a percentage of commonality between the data related to the undiagnosed patient and the one or more patterns, wherein the probability indicates a probability that the undiagnosed patient will be diagnosed with the orphan disease in the future. Claim: 2. The method of claim 1 , wherein the initial patient definition is selected from the group consisting of: a pre-defined diagnosis code and a pre-defined medication. Claim: 3. The method of claim 1 , wherein the one or more machine-readable data sets comprise the data related to the undiagnosed patient. Claim: 4. The method of claim 3 , further comprising: determining, by the one or more processors, based on applying the one or more machine learning algorithms to data related to each patient not included in the portion of the data, for each patient, a respective probability, wherein the respective probability is a numerical value indicating the percentage of commonality between the data related to the undiagnosed patient and the one or more patterns. Claim: 5. The method of claim 4 , further comprising: ranking, by the one or more processors, the probability and the respective probabilities, in order of relevance; and notifying, by the one or more processors, through an electronic communication, a user of an identity of any patient in the one or more machine-readable data sets with a probability above a predetermined threshold; and automatically ordering, by the one or more processors, based on communicating with an order management system over a network connection, a clinical test for the orphan disease, wherein a number of tests ordered is directly proportional to a number of patients with the probability above the predetermined threshold. Claim: 6. The method of claim 1 , wherein the generating the one or more patterns comprises: ranking, by the one or more processors, the common features based on the weighting; and retaining, by the one or more processors, the portion of the common features wherein the portion comprises common features of a pre-defined weight, wherein the portion comprises the one or more patterns. Claim: 7. The method of claim 1 , wherein the pre-defined medication is selected from the group consisting of: Cinryze, Firazyr, Berinert, and Kalbitor. Claim: 8. The method of claim 7 , wherein the feature category is diagnosis codes and one of the features is selected from the group consisting of: an allergic reaction, a swelling, mass, or lump in head and neck, a routine general medical examination at a healthcare facility, an immunization and screening for an infectious disease, another screening for suspected conditions that are not mental disorders or infectious diseases, an edema, an abdominal pain at an unspecified site, another upper respiratory disease, an unspecified symptom associated with female genital organs, and a chronic vascular insufficiency of the intestine. Claim: 9. The method of claim 7 , wherein the feature category is procedures and one of the features is selected from the group consisting of: an office or other outpatient visit for the evaluation and management of an established patient, another laboratory procedure, an office or other outpatient visit for the evaluation and management of an established patient, a chemistry and hematology laboratory procedure, another therapeutic procedure, a pathology procedure, another diagnostic radiology and related technique, a microscopic examination, an office or other outpatient visit for evaluation and management of an established patient, and a nonoperative urinary system measurement. Claim: 10. The method of claim 7 , wherein the feature category is therapies and one of the features is selected from the group consisting of: androgens and combinations, blood derivatives, androgens and combinations, unspecified agents, sympathomimetic agents, adrenals and combinations, analgesics or antipyretics that are opiate agonists, antibiotics that are penicillins, antibiotics that are erythromycin and macrolide, and analgesics or antipyretics that are nonsteroidal anti-inflammatory drugs. Claim: 11. The method of claim 7 , wherein the feature category is providers and one of the features is selected from the group consisting of: an outpatient hospital, an office, an independent laboratory, an emergency department, an inpatient hospital, an independent clinic, a patient home, an outpatient location that is not elsewhere classified, an ambulatory surgical center; and a land ambulance. Claim: 12. The method of claim 1 , wherein the one or more machine learning algorithms comprise a linear Support Vector Machines classification algorithm. Claim: 13. The method of claim 1 , wherein the one or more machine learning algorithms comprise at least two machine learning algorithms and wherein the tuning further comprises: compiling results of the tuning of each of the at least two machine learning algorithms and utilizing ensemble learning to consolidate portions of the at least two machine learning algorithms into a single machine learning algorithm. Claim: 14. The method of claim 1 , the tuning further comprising: associating, by the one or more processors, based on applying the one or more machine learning algorithms to the training set of test data, probabilities to a portion of the records in the training set of test data, wherein the probabilities reflect a likelihood of presence of the orphan disease for each record training set of test data; and completing the dynamically adjusting of the common features when the probabilities are within a pre-defined accuracy threshold. Claim: 15. The method of claim 1 , wherein the determining the probability comprises: obtaining, by the one or more processors, from a computing resource, electronic medical records for the undiagnosed patient for a defined temporal period, wherein the electronic medical records comprise electronic contact information for a healthcare provider to the undiagnosed patient; applying, by the one or more processors, the one or more machine learning algorithms to the electronic medical records; determining, by the one or more processors, based on the applying, if the probability is within a predetermined range; and based on determining that the probability exceeds a predetermined threshold, electronically alerting, in real time, the healthcare provider to the undiagnosed patient of the probability. Claim: 16. The method of claim 15 , further comprising: retaining, by the one or more processors, in a memory resource communicatively coupled to the one or more processors, the one or more patterns; obtaining, by the one or more processors, an indication regarding accuracy of the probability; and updating, by the one or more processors, the one or more patterns based on the indication. Claim: 17. A computer program product comprising: a non-transitory computer readable storage medium readable by one or more processors in a distributed computing environment, and storing instructions for execution by the one or more processors for performing a method comprising: obtaining, by the one or more processors in a distributed computing environment, one or more machine-readable data sets related to a patient population from one or more databases; identifying, by the one or more processors, based on an initial patient definition, a portion of data from the machine-readable data sets related to the patient population, wherein the portion of the data comprises patients of the patient population with an orphan disease, wherein the orphan disease is Hereditary Angioedema, and wherein the one feature category is selected from the group consisting of: diagnosis codes, procedures, therapies, providers, and locations; based on a frequency of features in the portion of the data, identifying, by the one or more processors, common features in the portion of the data and weighting the common features based on frequency of occurrence in the portion of the data, wherein the common features comprise mutual information, wherein the mutual information comprises features from a plurality of feature categories and wherein each pattern of the one or more patterns comprising a portion of the common features comprises features in one feature category of the plurality of feature categories; utilizing, by the one or more processors, the mutual information to update the initial patient definition, to generate an enhanced patient definition, wherein generating the enhanced patient definition comprises: identifying, by the one or more processors, one or more features of the common features with mutual information values above a predefined threshold; and selecting, by the one or more processors, a portion of the common features, wherein the portion of the common features comprises a smallest subset of features from the one or more features that collectively contain a majority of the mutual information, wherein the portion of the common features comprises the enhanced patient definition, and wherein the portion of the common features comprises a smallest number of the common features that is a largest number of differentiating characteristics of the patient population diagnosed with the orphan disease; generating, by the one or more processors, one or more patterns comprising the portion of the common features; generating, by the one or more processors, one or more machine learning algorithms based on the one or more patterns, the one or more machine learning algorithms to identify presence or absence of the given orphan disease in an undiagnosed patient based on absence or presence of features comprising the one or more patterns in data related to the undiagnosed patient; utilizing, by the one or more processors, statistical sampling to compile a training set of data, wherein the training set comprises data from the one or more data sets and at least one additional data set comprising data related to a population without the orphan disease, and wherein utilizing the statistical sampling comprises formulating and obtaining queries based on the data set and processing and responding to the queries, the processing comprising, for each query: evaluating, by the one or more processors, the query to determine if a prospective response to the query is a single value pulled from a single data set; based on determining that the prospective response to the query is the single value pulled from the single data set, assigning, by the one or more processors, the query to a given computing resource in the distributed computing environment; and based on determining that the prospective response to the query, distributing, by the one or more processors, the query over a group of computing resources of the distributed computing environment to maximize efficiency, wherein the distributing comprises assigning each computing resource of the group of computing resources a portion of the query to execute in parallel with at least one other computing resource of the group of computing resources executing another portion of the query; tuning, by the one or more processors, the one or more machine learning algorithms by applying the one or more machine learning algorithms to the training set of data; dynamically adjusting, by the one or more processors, the common features comprising the one or more patterns to improve accuracy such that the one or more machine learning algorithms can distinguish patient data indicating the orphan disease from patient data that does not indicate the orphan disease; and determining, by the one or more processors, based on applying the one or more machine learning algorithms to data related to the undiagnosed patient, a probability, wherein the probability is a numerical value indicating a percentage of commonality between the data related to the undiagnosed patient and the one or more patterns, wherein the probability indicates a probability that the undiagnosed patient will be diagnosed with the orphan disease in the future. Claim: 18. A system comprising: one or more memory; one or more processors in communication with the memory; and program instructions executable by the one or more processors in a distributed computed environment via the one or more memory to perform a method, the method comprising: obtaining, by the one or more processors in a distributed computing environment, one or more machine-readable data sets related to a patient population from one or more databases; identifying, by the one or more processors, based on an initial patient definition, a portion of data from the machine-readable data sets related to the patient population, wherein the portion of the data comprises patients of the patient population with an orphan disease, wherein the orphan disease is Hereditary Angioedema, and wherein the one feature category is selected from the group consisting of: diagnosis codes, procedures, therapies, providers, and locations; based on a frequency of features in the portion of the data, identifying, by the one or more processors, common features in the portion of the data and weighting the common features based on frequency of occurrence in the portion of the data, wherein the common features comprise mutual information, wherein the mutual information comprises features from a plurality of feature categories and wherein each pattern of the one or more patterns comprising a portion of the common features comprises features in one feature category of the plurality of feature categories; utilizing, by the one or more processors, the mutual information to update the initial patient definition, to generate an enhanced patient definition, wherein generating the enhanced patient definition comprises: identifying, by the one or more processors, one or more features of the common features with mutual information values above a predefined threshold; and selecting, by the one or more processors, a portion of the common features, wherein the portion of the common features comprises a smallest subset of features from the one or more features that collectively contain a majority of the mutual information, wherein the portion of the common features comprises the enhanced patient definition, and wherein the portion of the common features comprises a smallest number of the common features that is a largest number of differentiating characteristics of the patient population diagnosed with the orphan disease; generating, by the one or more processors, one or more patterns comprising the portion of the common features; generating, by the one or more processors, one or more machine learning algorithms based on the one or more patterns, the one or more machine learning algorithms to identify presence or absence of the given orphan disease in an undiagnosed patient based on absence or presence of features comprising the one or more patterns in data related to the undiagnosed patient; utilizing, by the one or more processors, statistical sampling to compile a training set of data, wherein the training set comprises data from the one or more data sets and at least one additional data set comprising data related to a population without the orphan disease, and wherein utilizing the statistical sampling comprises formulating and obtaining queries based on the data set and processing and responding to the queries, the processing comprising, for each query: evaluating, by the one or more processors, the query to determine if a prospective response to the query is a single value pulled from a single data set; based on determining that the prospective response to the query is the single value pulled from the single data set, assigning, by the one or more processors, the query to a given computing resource in the distributed computing environment; and based on determining that the prospective response to the query, distributing, by the one or more processors, the query over a group of computing resources of the distributed computing environment to maximize efficiency, wherein the distributing comprises assigning each computing resource of the group of computing resources a portion of the query to execute in parallel with at least one other computing resource of the group of computing resources executing another portion of the query; tuning, by the one or more processors, the one or more machine learning algorithms by applying the one or more machine learning algorithms to the training set of data; dynamically adjusting, by the one or more processors, the common features comprising the one or more patterns to improve accuracy such that the one or more machine learning algorithms can distinguish patient data indicating the orphan disease from patient data that does not indicate the orphan disease; and determining, by the one or more processors, based on applying the one or more machine learning algorithms to data related to the undiagnosed patient, a probability, wherein the probability is a numerical value indicating a percentage of commonality between the data related to the undiagnosed patient and the one or more patterns, wherein the probability indicates a probability that the undiagnosed patient will be diagnosed with the orphan disease in the future. Patent References Cited: 8068993 November 2011 Karlov et al. ; 2009/0171871 July 2009 Zhang et al. ; 2009/0171956 July 2009 Gupta ; 2012/0271612 October 2012 Barsoum ; 2013/0071860 March 2013 Hale ; 2013/0238533 September 2013 Virkar ; 2013/0262357 October 2013 Amarasingham et al. ; 2014/0095201 April 2014 Farooq ; 2014/0278448 September 2014 Sadeghi ; 2014/0279746 September 2014 De Bruin ; 2015/0324527 November 2015 Siegel et al. ; 2016/0063212 March 2016 Monier ; 2017/0053665 February 2017 Quatieri, Jr. et al. ; 2017/0124269 May 2017 McNair ; 2017/0198349 July 2017 Rice ; 2017/0262604 September 2017 Francois ; 2017/0286622 October 2017 Cox ; 2017/0308981 October 2017 Razavian et al. ; 2019/0019581 January 2019 Vaughan et al. ; 2019/0138693 May 2019 Muller ; 2020/0151627 May 2020 Shukla et al. ; WO2016094330 June 2016 ; WO2018090009 May 2018 ; WO2020102220 May 2020 ; WO2020132468 June 2020 Other References: Kvancz, Predictive Analytics: A Case Study in Machine-Learning and Claims Databases, Dec. 2016 (Year: 2016). cited by examiner ; Speiser et al., “Random Forest Classification of Etiologies for an Orphan Disease”, Statistics in Medicine, 34, 887-899, doi: 10.1002/sim6351, Year: 2015. cited by applicant ; Huw Llewelyn, “Reasoning in Medicine and Science”, Sep. 2015, https:/blog.oup.com/2013/09/medical-diagnosis-reasoning-probable-eliminationn/ (Acessed via Wayback machine). cited by applicant ; International Search Report and Written Opinion of International Application No. PCT/US2019/060962, dated Mar. 9, 2020, 9 pages. cited by applicant ; International Search Report and Written Opinion for International Application No. PCT/US2019/067893, dated Mar. 15, 2020, 8 pages. cited by applicant Assistant Examiner: Lee, Andrew E. Primary Examiner: Morgan, Robert W Attorney, Agent or Firm: Heslin Rothenberg Farley & Mesiti ; Ziegler, Esq., Kristian E. ; Pearlman, Esq., Rachel L.

Klicken Sie ein Format an und speichern Sie dann die Daten oder geben Sie eine Empfänger-Adresse ein und lassen Sie sich per Email zusenden.

BibTeX Citavi, JabRef, u.a.
(Literaturverwaltung)

PDF kein Volltext!
(Merkzettel, Notizen)

RIS Endnote, Citavi u.a.
(Literaturverwaltung)

MODS
(XML zur Weiterverarbeitung)

oder

Wählen Sie das für Sie passende Zitationsformat und kopieren Sie es dann in die Zwischenablage, lassen es sich per Mail zusenden oder speichern es als PDF-Datei.

Gewünschter Zitations-Stil:

oder

Bitte prüfen Sie, ob die Zitation formal korrekt ist, bevor Sie sie in einer Arbeit verwenden. Benutzen Sie gegebenenfalls den "Exportieren"-Dialog, wenn Sie ein Literaturverwaltungsprogramm verwenden und die Zitat-Angaben selbst formatieren wollen.