Feature Selection on Dirac
Device: Dirac-1
Introduction
In machine learning problems, we often have to start with a large number of features. We need a feature selection technique that can discover a relatively small subset of the most relevant features. In what follows we present a tutorial on using QCi's technology to select a set of features by minimizing their inter-correlation. This approach can be used with any unsupervised machine learning approach such as anomaly detection and clustering algorithms.
Methodology
Let us have a dataset with samples and features, represented by a matrix . Moreover, let us each column of by where , representing all samples for feature . We can choose a subset of features of size () such that the inter-correlation of the subset is minimal. We have
where
where denotes a correlation function such as the Pearson correlation, and is a binary variable indicating inclusion or exclusion of feature . Obviously, . The above minimization problem is subject to a constraint,
We can exclude the diagonal elements of as they always add up to . In the matrix form we have,
subject to the above constraint. Note that is a identity matrix. Note too that we have assumed that the reduced dimension is assumed to be given in the above approach.
Medicare Prescription Data
We implemented this approach using a publically available set of data on prescription of opioids in the United States. The dataset can be found at https://www.cms.gov/data-research/statistics-trends-and-reports/medicare-provider-utilization-payment-data/part-d-prescriber
Clean data
We start by cleaning the dataset.
In [1]:
import pandas as pd# Input INP_FILE = "Medicare_Provider_Utilization_and_Payment_Data__Part_D_Prescriber_Summary_Table_CY2014__50001-NNN__ANON.csv"OUT_FILE = "cleaned_medicare_data.csv"CON_VARS = ["total_claim_count","total_30_day_fill_count","total_drug_cost","total_day_supply","bene_count","total_claim_count_ge65","total_30_day_fill_count_ge65","total_drug_cost_ge65","total_day_supply_ge65","bene_count_ge65","brand_claim_count","brand_drug_cost","generic_claim_count","generic_drug_cost","other_claim_count","other_drug_cost","mapd_claim_count","mapd_drug_cost","pdp_claim_count","pdp_drug_cost","lis_claim_count","lis_drug_cost","nonlis_claim_count","nonlis_drug_cost","opioid_claim_count","opioid_drug_cost","opioid_day_supply","opioid_bene_count","opioid_prescriber_rate","antibiotic_claim_count","antibiotic_drug_cost","antibiotic_bene_count","hrm_claim_count_ge65","hrm_drug_cost_ge65","hrm_bene_count_ge65","antipsych_claim_count_ge65","antipsych_drug_cost_ge65","antipsych_bene_count_ge65","average_age_of_beneficiaries","beneficiary_age_less_65_count","beneficiary_age_65_74_count","beneficiary_age_75_84_count","beneficiary_age_greater_84_count","beneficiary_female_count","beneficiary_male_count","beneficiary_race_white_count","beneficiary_race_black_count","beneficiary_race_asian_pi_count","beneficiary_race_hispanic_count","beneficiary_race_nat_ind_count","beneficiary_race_other_count","beneficiary_nondual_count","beneficiary_dual_count","beneficiary_average_risk_score",]VALID_PROVIDER_MI = ["A","M","J","L","R","S","E","D","C","B","K","P","W","H","T","G","F","N","V","I","O","Y","Z","U", "Q","X",]VALID_GEN = ["F", "M", "Other", "Unknown"]VALID_ENTITIES = ["I", "O"]VALID_DESC_FLAGS = ["S", "T"]VALID_ENROLLS = ["E", "N", "O"]# Some utilities def convert_to_float(x):try:return float(x)except:return Nonedef convert_to_int(x):try:return int(float(x))except:return None# Read data df = pd.read_csv(INP_FILE, on_bad_lines = "skip", low_memory=False)# Clean categorical variables df["nppes_provider_mi"] = df["nppes_provider_mi"].fillna("Unknown")df["nppes_provider_mi"] = df["nppes_provider_mi"].apply(lambda x: x if x in VALID_PROVIDER_MI else "Unknown")df["nppes_credentials"] = df["nppes_credentials"].fillna("Unknown")df["nppes_credentials"] = df["nppes_credentials"].apply(lambda x: str(x).replace(".", ""))cred_hash = {"MEDICAL DOCTOR": "MD","NURSE PRACTITIONER": "NP",}df["nppes_credentials"] = df["nppes_credentials"].apply(lambda x: cred_hash[x] if x in cred_hash else x,)df["nppes_provider_gender"] = df["nppes_provider_gender"].fillna("Unknown")df["nppes_provider_gender"] = df["nppes_provider_gender"].apply(lambda x: x if x in VALID_GEN else "Other",)df["nppes_entity_code"] = df["nppes_entity_code"].apply(lambda x: x if x in VALID_ENTITIES else "Unknown",)df["nppes_provider_zip5"] = df["nppes_provider_zip5"].fillna("Unknown")df["nppes_provider_country"] = df["nppes_provider_country"].apply(lambda x: "US" if x == "US" else "Other",)df["description_flag"] = df["description_flag"].apply(lambda x: x if x in VALID_DESC_FLAGS else "Unknown",)df["medicare_prvdr_enroll_status"] = df["medicare_prvdr_enroll_status"].apply(lambda x: x if x in VALID_ENROLLS else "Unknown",)# Treat missing beneficiary count as it cannot be zero df["bene_count"] = df["bene_count"].apply(convert_to_int).fillna(-1)tmp_df = df.groupby("specialty_description", as_index=False,)["bene_count"].mean()bene_count_hash = dict(zip(tmp_df["specialty_description"],tmp_df["bene_count"],))df["bene_count"] = df.apply(lambda x: x["bene_count"] if x["bene_count"] > 0 else bene_count_hash[x["specialty_description"]],axis=1,)# Treat continuous variables for item in CON_VARS:df[item] = df[item].apply(convert_to_float).fillna(0.0)# Filter out invalid states df = df[~df["nppes_provider_state"].isin(["XX", "E", "N", "S"])]# Output df.to_csv(OUT_FILE, index=False)
Generate features
We then generate features. The categorical features are encoded using the average value of a few important variables in each category.
In [2]:
import pandas as pd# Input INP_FILE = "cleaned_medicare_data.csv"OUT_FILE = "medicare_features.csv"# Set some parameters CAT_VARS = ["nppes_provider_mi",#"nppes_credentials", # This is rather messy, so ignoring it. "nppes_provider_gender","nppes_entity_code","nppes_provider_city","nppes_provider_zip5",#"nppes_provider_country", # Almost all cases are US "specialty_description","medicare_prvdr_enroll_status","nppes_provider_state",]CON_VARS = ["total_claim_count","total_30_day_fill_count","total_drug_cost","total_day_supply","bene_count","total_claim_count_ge65","total_30_day_fill_count_ge65","total_drug_cost_ge65","total_day_supply_ge65","bene_count_ge65","brand_claim_count","brand_drug_cost","generic_claim_count","generic_drug_cost","other_claim_count","other_drug_cost","mapd_claim_count","mapd_drug_cost","pdp_claim_count","pdp_drug_cost","lis_claim_count","lis_drug_cost","nonlis_claim_count","nonlis_drug_cost","opioid_claim_count","opioid_drug_cost","opioid_day_supply","opioid_bene_count","antibiotic_claim_count","antibiotic_drug_cost","antibiotic_bene_count","hrm_claim_count_ge65","hrm_drug_cost_ge65","hrm_bene_count_ge65","antipsych_claim_count_ge65","antipsych_drug_cost_ge65","antipsych_bene_count_ge65","average_age_of_beneficiaries","beneficiary_age_less_65_count","beneficiary_age_65_74_count","beneficiary_age_75_84_count","beneficiary_age_greater_84_count","beneficiary_female_count","beneficiary_male_count","beneficiary_race_white_count","beneficiary_race_black_count","beneficiary_race_asian_pi_count","beneficiary_race_hispanic_count","beneficiary_race_nat_ind_count","beneficiary_race_other_count","beneficiary_nondual_count","beneficiary_dual_count","beneficiary_average_risk_score",]# Read and clean data df = pd.read_csv(INP_FILE, low_memory=False)# Embed categorical features embedded_cat_features = []for item in CAT_VARS:tmp_df = df.groupby(item, as_index=False).agg({"opioid_claim_count": "mean","opioid_drug_cost": "mean","opioid_day_supply": "mean","opioid_bene_count": "mean","opioid_prescriber_rate": "mean",}).rename(columns={"opioid_claim_count": "%s_opioid_claim_count" % item,"opioid_drug_cost": "%s_opioid_drug_cost" % item,"opioid_day_supply": "%s_opioid_day_supply" % item,"opioid_bene_count": "%s_opioid_bene_count" % item,"opioid_prescriber_rate": "%s_opioid_prescriber_rate" % item,})df = df.merge(tmp_df, how="left", on=item)embedded_cat_features += ["%s_opioid_claim_count" % item,"%s_opioid_drug_cost" % item,"%s_opioid_day_supply" % item,"%s_opioid_bene_count" % item,"%s_opioid_prescriber_rate" % item,]# Drop unembedded categorical variables and some others df = df[["npi"] + CON_VARS + embedded_cat_features]# Write features file df.to_csv(OUT_FILE, index=False)
Feature Selection
Once the features are generated, we can implement the above-mentioned feature selection algorithm. We start by importing some libraries, setting some parameters, and loading the features into a Pandas dataframe.
In [3]:
# Import libsimport sysimport osimport timeimport numpy as npimport pandas as pdfrom qci_client import QciClient# Define some parametersFEATURES_FILE = "medicare_features.csv"REDUCED_DIM = 10# Read featuresdf = pd.read_csv(FEATURES_FILE, low_memory=False)
We now print the feature names and get the total count of features in the dataset,
In [4]:
feature_names = list(set(df.columns) - {"npi"})orig_dim = len(feature_names)print("Original dimension is %d; reduced dimension will be %d" % (orig_dim,REDUCED_DIM,))
Out [ ]:
Original dimension is 93; reduced dimension will be 10
We should now create the objective matrix ,
In [5]:
# Generate the objective matrixX = np.array(df[feature_names])C = abs(np.corrcoef(X, rowvar=False))# Make correlation symmetric to machine precisionC = 0.5 * (C + C.transpose())objective = C - np.eye(orig_dim)objective = np.array(objective, dtype=np.float32)
And create the constraint matrix,
In [6]:
# Generate the constraintcons_lhs = np.ones(shape=(orig_dim), dtype=np.float32)cons_lhs = cons_lhscons_rhs = np.array([-REDUCED_DIM])constraints = np.hstack([cons_lhs, cons_rhs])print(constraints.shape)
Out [ ]:
(94,)
We now solve the above quadratic binary problem using QCi's Dirac-1,
In [7]:
token = "your_token"api_url = "https://api.qci-prod.com"qci = QciClient(api_token=token, url=api_url)
In [8]:
# Create json objectsobjective_json = {"file_name": "objective_tutorial.json","file_config": {"objective": {"data": objective, "num_variables": orig_dim},} }constraint_json = {"file_name": "constraints_tutorial.json","file_config": {"constraints": {"data": constraints, #"num_variables": orig_dim,#"num_constraints": 1,}}}# Solve the optimizzation problem#qci = QciClient()objective_file_id = qci.upload_file(file=objective_json)["file_id"]constraint_file_id = qci.upload_file(file=constraint_json)["file_id"]# Setup job jsonjob_params = {"device_type": "dirac-1", "alpha": 5.0, "num_samples": 20,}body = qci.build_job_body(job_type="sample-constraint", job_params=job_params,constraints_file_id=constraint_file_id, objective_file_id=objective_file_id,job_name=f"tutorial_eqc1",job_tags=["tutorial_eqc1"],)# Run the jobjob_response_json = qci.process_job(job_body=body)print(job_response_json)results = job_response_json["results"]energies = results["energies"]samples = results["solutions"]is_feasibles = results["feasibilities"]if True:print("Energies:", energies) # Pick a feasible solution with lowest energy # The sample solutions are sorted by energy sol = Nonefor i, item in enumerate(samples):sol = itemis_feasible = is_feasibles[i]if is_feasible:breakif not is_feasible:print("Solution is not feasible!")assert sol is not None, "No feasible solution found!"assert len(sol) == orig_dim, "Inconsistent solution size!"assert sum(sol) == REDUCED_DIM, "Solution is not feasible!"
Out [ ]:
2024-05-08 10:28:12 - Dirac allocation balance = 0 s (unmetered) 2024-05-08 10:28:12 - Job submitted: job_id='663bb62cd448b017e54f94bd' 2024-05-08 10:28:12 - QUEUED 2024-05-08 10:28:15 - RUNNING 2024-05-08 10:48:23 - COMPLETED 2024-05-08 10:48:26 - Dirac allocation balance = 0 s (unmetered) {'job_info': {'job_id': '663bb62cd448b017e54f94bd', 'job_submission': {'job_name': 'tutorial_eqc1', 'job_tags': ['tutorial_eqc1'], 'problem_config': {'quadratic_linearly_constrained_binary_optimization': {'constraints_file_id': '663bb62c98263204a3657526', 'objective_file_id': '663bb62b98263204a3657524', 'alpha': 5, 'atol': 1e-10}}, 'device_config': {'dirac-1': {'num_samples': 20}}}, 'job_status': {'submitted_at_rfc3339nano': '2024-05-08T17:28:12.687Z', 'queued_at_rfc3339nano': '2024-05-08T17:28:12.689Z', 'running_at_rfc3339nano': '2024-05-08T17:28:13.046Z', 'completed_at_rfc3339nano': '2024-05-08T17:48:21.532Z'}, 'job_result': {'file_id': '663bbae598263204a3657528', 'device_usage_s': 1128}}, 'status': 'COMPLETED', 'results': {'counts': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'energies': [-497.8916331699111, -497.8896800449111, -497.8896800449111, -497.8887034824111, -497.8847972324111, -497.8847972324111, -497.8847972324111, -497.8838206699111, -497.8789378574111, -497.8623362949111, -497.8584300449111, -497.8515941074111, -497.8291331699111, -497.8125316074111, -497.7988597324111, -497.7949534824111, -497.7773753574111, -497.7773753574111, -497.7724925449111, -497.7724925449111], 'feasibilities': [True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True], 'objective_values': [2.108758625341579, 2.109875851077959, 2.109875851077959, 2.1111518519464876, 2.1151239282917222, 2.1151239282917227, 2.115212611388415, 2.1162998459767546, 2.121476054191589, 2.13726658251835, 2.1418038606643672, 2.148616509046405, 2.1707787389168516, 2.1876401392510156, 2.2009708418045193, 2.2050841469317675, 2.2229102615965526, 2.2229102615965526, 2.227802827401319, 2.227861546212807], 'solutions': [[0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0], [1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0], [1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0], [1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0], [1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0]]}} Energies: [-497.8916331699111, -497.8896800449111, -497.8896800449111, -497.8887034824111, -497.8847972324111, -497.8847972324111, -497.8847972324111, -497.8838206699111, -497.8789378574111, -497.8623362949111, -497.8584300449111, -497.8515941074111, -497.8291331699111, -497.8125316074111, -497.7988597324111, -497.7949534824111, -497.7773753574111, -497.7773753574111, -497.7724925449111, -497.7724925449111]
Finally, we can print the list of selected variables,
In [9]:
selected_vars = []for i in range(orig_dim):if sol[i] > 0:selected_vars.append(feature_names[i])print(selected_vars)
Out [ ]:
['nppes_provider_city_opioid_drug_cost', 'beneficiary_race_black_count', 'nppes_provider_gender_opioid_drug_cost', 'specialty_description_opioid_drug_cost', 'nppes_provider_mi_opioid_drug_cost', 'nppes_entity_code_opioid_prescriber_rate', 'beneficiary_race_nat_ind_count', 'antipsych_drug_cost_ge65', 'beneficiary_average_risk_score', 'beneficiary_race_asian_pi_count']