The text data format is as follows:
0001369157| 101| seizure (-) | G40.901| epilepsy | G40.901 $| epilepsy $| 1946.56 0001370448| 101| seizure (-) | G40.901| epilepsy | G40.901$J40.x00 $| epilepsy $bronchitis $| 2842.77 0001374918| 101| seizure (-) | R56.001| febrile convulsion | R56.001$J03.900 $| febrile convulsion $acute tonsillitis $| 1813.14 0001358030| 101| seizure (-) | R56.001| febrile convulsion | R56.001$J03.900 $| febrile convulsion $acute tonsillitis $| 2209.05 0001368014| 101| seizure (-) | G41.900| status epilepticus | G41.900 $| status epilepticus $| 2986.82 0001384553| 101| seizure (-) | G40.103| symptomatic focal epilepsy | G40.103 $| symptomatic focal epilepsy $| 1944.66 0001383190| 101| seizure (-) | R56.001| febrile convulsion | R56.001$J06.900 $| febrile convulsion $acute upper respiratory infection $| 2532.59
You need to perform group by operation according to id_drg,pdxCode and sdxNames to count the number of medical records + expense data in each case
The code is as follows:
package import scala.collection.mutable.ListBuffer import /** * Created by Shea on 2018/11/24. */ object GroupBy2Text extends App{ val path="C:\\Users\\Shea\\Desktop\\drgs.txt" val encode="utf-8" val res=getCategoryAll(path,encode,"\\|","1,3")("6","\\$") getGroupRes(res,path,encode,"\\|","1,3")("6","\\$")("7","4") /** * Get total categories * @param path File path * @param encode Code * @param delimiter Separator between fields * @param indexs Which fields need to be grouped (here is the index of the field) and spliced with commas * @param splitFiled If the values of some grouping fields are multiple splicing values, do you need to split them into sets * @param splitFlag Spliced symbols */ def getCategoryAll(path:String,encode:String,delimiter:String,indexs:String)(splitFiled:String="",splitFlag:String): List[Set[String]] ={ val groupList=new ListBuffer[Set[String]] Source.fromFile(path,encode).getLines().foreach{line=> val arr=line.split(delimiter) val fields=indexs.split("\\,") val specialField:Set[String]=delimiter match { case ""=>Set("") case _=>arr(splitFiled.toInt).split(splitFlag).toSet } //Splicing common fields val>arr(index.toInt)).mkString("\001") //Splicing with special field splitfile val groupContent=specialField ++ Set(common) groupList.append(groupContent) } //Get all the final classes val groupRes:List[Set[String]]=groupList.distinct.toList groupRes } /** * Result of generation * @param groupRes * @param path * @param encode * @param delimiter * @param indexs * @param splitFiled * @param splitFlag * @param parms Other field related content to be output (use comma to splice index) */ def getGroupRes(groupRes:List[Set[String]],path:String,encode:String,delimiter:String,indexs:String)(splitFiled:String="",splitFlag:String)(parms:String*): Unit ={ for (str<-groupRes){ var count=0//Statistical number var groupFileds=""//Grouping field //Here, you can specify the content of other fields related to the output -- or you can set several buffer s for several fields val buffer = new StringBuffer() Source.fromFile(path,encode).getLines().foreach{line=> val arr=line.split(delimiter) val fields=indexs.split("\\,") val specialField:Set[String]=delimiter match { case ""=>Set("") case _=>arr(splitFiled.toInt).split(splitFlag).toSet } //Splicing common fields val>arr(index.toInt)).mkString("\001") //Splicing with special field splitfile val groupContent=specialField ++ Set(common) if(str==groupContent){ count+=1 groupFileds=common val>arr(param.toInt)).mkString(",").trim buffer.append(cont+"@") /*akc190s.append(akc190+",") yka055s.append(arr(7)+",") pdx=pdxCode+","+pdxName*/ } } //Special group fields val specialField=str.filterNot(_.contains("\001")).mkString(",") //Output final results println(s"${groupFileds}|${specialField}|${count}|${buffer.toString}") } } }
The final treatment results are as follows:
101G40.901| epilepsy | 1| 1946.56, epilepsy@ 101G40.901|epilepsy, bronchitis|1|2842.77, epilepsy@ 101R56.001| febrile convulsion, acute tonsillitis | 2| 1813.14, febrile convulsion @ 2209.05, febrile convulsion@ 101G41.900| status epilepticus | 1| 2986.82, status epilepticus@ 101G40.103| symptomatic focal epilepsy | 1| 1944.66, symptomatic focal epilepsy@ 101R56.001| febrile convulsion, acute upper respiratory infection | 1| 2532.59, febrile convulsion@
The final result is that under the same drg, there are 6 combinations according to the main diagnosis + secondary diagnosis,
The main diagnosis was: r56.001 - > febrile convulsion;
The diagnosis was: febrile convulsion + acute tonsillitis;
The combination of two cases cost 1813.14 and 2209.05 respectively