Scala - Json parsing optimization

I introduction

Use com.com in the work scenario alibaba. Fastjson encountered some time-consuming scenarios. Here are some time-consuming scenarios and simple optimization methods.

II Storage form of Jason information

The usage scenario is the most basic kv String combination. The loading location is the dirver part of Spark Program. The execution time of dirver directly affects the start execution time of task s on subsequent executors. Therefore, dirver can finish execution faster, which is helpful to the running time of the whole program. The following code is directly displayed:

    def countDiff(isJson: Boolean, count: Int): Unit = {
      val st = System.currentTimeMillis()
      if (isJson) {
        (0 until count).foreach(num => {
          val jsonInfo = "{\"0\":\"A\",\"1\":\"B\",\"2\":\"C\",\"3\":\"D\",\"4\":\"E\",\"5\":\"F\",\"6\":\"G\",\"7\":\"H\",\"8\":\"I\",\"9\":\"J\",\"10\":\"K\"}"
          val js = JSON.parseObject(jsonInfo)
          val infoMap = new mutable.HashMap[String, String]()
          (0 to 10).foreach(key => {
            infoMap.put(key.toString, js.getString(key.toString))
          })
        })
      } else {
        (0 until count).foreach(num => {
          val stringInfo = "0-A,1-B,2-C,3-D,4-E,5-F,6-G,7-H,8-I,9-J,10-K"
          val infoMapV2 = new mutable.HashMap[String, String]()
          stringInfo.split(",").foreach(kv => {
            val info = kv.split("-")
            infoMapV2.put(info(0), info(1))
          })
        })
      }
      val end = System.currentTimeMillis()
      println(s"isJson: $isJson count: $count cost: ${end - st}")
    }

The advantage of json is that it is easy to expand. For programs with strong expansibility, you can use json. If the data form is relatively fixed, you don't have to stick to json. Using standardized String + Split = > array to parse character information can improve efficiency. Here, the same kv information is stored in jsonString and String respectively. Take a look at the difference in parsing speed:

count \ mode(ms)JsonStringArrayString
10000340107
100000492229

Only 10 fields are simulated here. With the increase of fields, Array can bring faster speed and less memory occupation, but the premise is that the data is relatively stable and the scalability requirements are not high, otherwise Jason is more suitable.

III Choice of Jason information

Relevant digital String information is often stored in Json information. Sometimes the calculation results are directly stored in Json without processing. The problem is that too high accuracy is not needed or takes up a lot of additional storage space. Therefore, when using Json to store digital data, you can consider the accuracy requirements of your own scene, When the efficiency problem is serious, you can choose a compromise scheme:

    def countParseDiff(mode: String, count: Int): Unit = {
      val st = System.currentTimeMillis()
      if (mode == "long") {
        (0 until count).foreach(num => {
          val jsStringLong = "{\"88\":\"0.162510882243847475062391508306752374\",\"89\":\"0.75163830436479464173691490471724633\",\"90\":\"0.90213881974928984813815080861401011\",\"91\":\"0.1875639607945092204281331045109613\",\"92\":\"0.3945177134094317992893882329281237\",\"93\":\"0.443392380272448658513319970774621658\",\"94\":\"0.77182607059257973081356609797223624\",\"95\":\"0.50256119960591761686566240207932918\",\"96\":\"0.5436354507843749675738857329102989\",\"97\":\"0.26336209509530252723468883192026759\",\"10\":\"0.43223417962825454959029330528989392\",\"98\":\"0.65961507889175347295331928301681895\",\"11\":\"0.73002571997508788970640871896359086\",\"99\":\"0.69490546363091621959148019445393714\",\"12\":\"0.80419937594465358375609071155597433\",\"13\":\"0.57130856512946875991170199237267025\",\"14\":\"0.398427890033879520027630561420966\",\"15\":\"0.85680415843134176331851338806838752\",\"16\":\"0.5618272084238763580017849972272359\",\"17\":\"0.190936285304858844214235541288429568\",\"18\":\"0.7502825715131136381982941037092364\",\"19\":\"0.099464508077999893727568964421983985\",\"0\":\"0.222645275619173136602671441669516519\",\"1\":\"0.78525192630180045824117648090534560\",\"2\":\"0.0014289816474472028487090913858026220\",\"3\":\"0.73055196568267071833504758826952868\",\"4\":\"0.6146026730519335888293128722211736\",\"5\":\"0.93311507126599168358232366320764801\",\"6\":\"0.69776103418108444326561403452677978\",\"7\":\"0.89471205771733098387666545095883205\",\"8\":\"0.60965235420204424395289906073765848\",\"9\":\"0.148146405092809074023799126433771964\",\"20\":\"0.9119458944424198470866510537670215\",\"21\":\"0.99548939891347984519037057015335097\",\"22\":\"0.89884949492817324184738437083847691\",\"23\":\"0.24295095597555458468237053922265661\",\"24\":\"0.31983940212667155463510135088480851\",\"25\":\"0.3619622690709118147944857225563547\",\"26\":\"0.98708041704570223069808821868483279\",\"27\":\"0.73770310629710731958849949523464054\",\"28\":\"0.253892748976222274027544648730984779\",\"29\":\"0.418231499930383671944518800209715879\",\"30\":\"0.093438112896236492664855662693530414\",\"31\":\"0.71645314429333622011403042248275755\",\"32\":\"0.72578104375401137254805613073022718\",\"33\":\"0.367622772196358971169648929427076638\",\"34\":\"0.9651087457593727209463890511061977\",\"35\":\"0.93613268928043575146409532065299900\",\"36\":\"0.229400763555740246150723219172839676\",\"37\":\"0.013978171736951261222881505119205247\",\"38\":\"0.331597645673472165274659610202350910\",\"39\":\"0.50817090103061015870652494868209761\",\"40\":\"0.75432525605255177310874447004217709\",\"41\":\"0.97550523481070775605663805443603601\",\"42\":\"0.099489728367332615987525730932248971\",\"43\":\"0.53046479672237435854609361200698524\",\"44\":\"0.79956526853335872757913726557880530\",\"45\":\"0.32568902655551894050494340098041679\",\"46\":\"0.9791554624109651031782389261818704\",\"47\":\"0.98430338008314435612449641990748013\",\"48\":\"0.5634158753426296544173731196146313\",\"49\":\"0.64954839429878476463172431184568325\",\"50\":\"0.344184160375300752486865908489860188\",\"51\":\"0.50952645792498753319142852581941981\",\"52\":\"0.73517128334284722251181987481288774\",\"53\":\"0.7535177942929978922006214118166552\",\"54\":\"0.37432546454507319198713640156471298\",\"55\":\"0.92119963394730394575121550260804010\",\"56\":\"0.5644883133027516345288815934146649\",\"57\":\"0.51723581108412648011195455457985381\",\"58\":\"0.78272025908229413533564139396612293\",\"59\":\"0.58108666048111332174970283402206024\",\"60\":\"0.134180223105083045194091865964649168\",\"61\":\"0.62552164145708891684505120299153380\",\"62\":\"0.67820260189090552537868320540878432\",\"63\":\"0.987200481586529186076835224402724\",\"64\":\"0.0122218867417818184111904989194471802\",\"65\":\"0.16882084843021654189813969801237034\",\"66\":\"0.363453572803890237381547386535950531\",\"67\":\"0.194713961913574482689930354085703428\",\"68\":\"0.70874762991191886110330351631153126\",\"69\":\"0.35040161999009386857444510587041724\",\"70\":\"0.11978843017774928501326394172649485\",\"71\":\"0.93151191040940767584193796285966304\",\"72\":\"0.61382313107106912019358703000816484\",\"73\":\"0.78858271137880386627593080956996183\",\"74\":\"0.279358352495240551033160944757224509\",\"75\":\"0.5216654156338536717090892611097678\",\"76\":\"0.3417722938666448180642040312126211\",\"77\":\"0.152274417383674539185594114464461656\",\"78\":\"0.135547985697930256270874724292599229\",\"79\":\"0.98010966218205017589952176389685906\",\"80\":\"0.6578406413689183814759456971693968\",\"81\":\"0.11811379069445966878415907837253648\",\"82\":\"0.492349990893067868703871325765922756\",\"83\":\"0.89180681867081564796162431769746658\",\"84\":\"0.41297650311072024721212667454759836\",\"85\":\"0.88747531121701528404131155717887919\",\"86\":\"0.90349795471736533544196982084241111\",\"87\":\"0.9386748762149012568763468813023934\"}"
          val jsonLong = JSON.parseObject(jsStringLong)
          val tmpMap = scala.collection.mutable.HashMap[String, String]()
          jsonLong.keySet().asScala.foreach(key => {
            tmpMap.put(key, jsonLong.getString(key))
          })
        })
      } else {
        (0 until count).foreach(num => {
          val jsStringShort = "{\"88\":\"0.8158929079196815\",\"89\":\"0.8472558210241492\",\"90\":\"0.7571396105693562\",\"91\":\"0.5141626021623498\",\"92\":\"0.8439050192147498\",\"93\":\"0.7271457417727551\",\"94\":\"0.891149114837512\",\"95\":\"0.27072953690327606\",\"96\":\"0.3055197951988664\",\"97\":\"0.20438177835154125\",\"10\":\"0.766827079232007\",\"98\":\"0.2021272684779355\",\"11\":\"0.03283323457008702\",\"99\":\"0.6737307208249325\",\"12\":\"0.25414909501413907\",\"13\":\"0.29072593216276177\",\"14\":\"0.49620931136007596\",\"15\":\"0.27933905816569393\",\"16\":\"0.0531904152415984\",\"17\":\"0.07704317584508491\",\"18\":\"0.02277015566477736\",\"19\":\"0.5486711063871702\",\"0\":\"0.0469614604776194\",\"1\":\"0.16070961267018247\",\"2\":\"0.8817701316974628\",\"3\":\"0.6682986338063475\",\"4\":\"0.656183253579701\",\"5\":\"0.6971199797337868\",\"6\":\"0.3908398271824104\",\"7\":\"0.6170198799911262\",\"8\":\"0.44708748161186385\",\"9\":\"0.048810076841649286\",\"20\":\"0.640962760597961\",\"21\":\"0.6641403527945248\",\"22\":\"0.4093385079380656\",\"23\":\"0.3334890519668272\",\"24\":\"0.9324013001228181\",\"25\":\"0.03190491294110198\",\"26\":\"0.055387101570207875\",\"27\":\"0.7647134525612731\",\"28\":\"0.4621310378408401\",\"29\":\"0.3347251667924894\",\"30\":\"0.02826046923836001\",\"31\":\"0.2714550363961027\",\"32\":\"0.5332476764257922\",\"33\":\"0.3402800369793736\",\"34\":\"0.875372224171887\",\"35\":\"0.024472111744378244\",\"36\":\"0.46909268832140993\",\"37\":\"0.34724598521452843\",\"38\":\"0.7505323191581331\",\"39\":\"0.5356235355136567\",\"40\":\"0.592686473635947\",\"41\":\"0.31229813640284376\",\"42\":\"0.30963475186761635\",\"43\":\"0.6031667823226999\",\"44\":\"0.8836793918733008\",\"45\":\"0.5622908790934571\",\"46\":\"0.10925686767648335\",\"47\":\"0.5163839503886498\",\"48\":\"0.7441258412347347\",\"49\":\"0.5192604698310477\",\"50\":\"0.17629156523735756\",\"51\":\"0.06290113187088997\",\"52\":\"0.942081052292314\",\"53\":\"0.32390055400844764\",\"54\":\"0.007553976347384084\",\"55\":\"0.4181993719998841\",\"56\":\"0.5959668872339888\",\"57\":\"0.10741883217518355\",\"58\":\"0.5053748831086992\",\"59\":\"0.4391985327649449\",\"60\":\"0.012846804029912007\",\"61\":\"0.9321912110615005\",\"62\":\"0.25851270225096634\",\"63\":\"0.23242769026053833\",\"64\":\"0.3001966807927454\",\"65\":\"0.09838766955590339\",\"66\":\"0.13728693139240056\",\"67\":\"0.12217906283919577\",\"68\":\"0.9334528395460703\",\"69\":\"0.5967513242937469\",\"70\":\"0.3876559519551116\",\"71\":\"0.3051747293220177\",\"72\":\"0.8805638192661605\",\"73\":\"0.976970913339818\",\"74\":\"0.2810321509109014\",\"75\":\"0.5456588027504454\",\"76\":\"0.45996852344653516\",\"77\":\"0.7965937024120928\",\"78\":\"0.9049483078810182\",\"79\":\"0.23846002545011236\",\"80\":\"0.8398676998486626\",\"81\":\"0.11718450069652409\",\"82\":\"0.864258577051585\",\"83\":\"0.04745273365850944\",\"84\":\"0.9743517566153443\",\"85\":\"0.22244377428833695\",\"86\":\"0.5033787213718174\",\"87\":\"0.1129771936952979\"}"
          val jsonShort = JSON.parseObject(jsStringShort)
          val tmpMap = scala.collection.mutable.HashMap[String, String]()
          jsonShort.keySet().asScala.foreach(key => {
            tmpMap.put(key, jsonShort.getString(key))
          })
        })
      }
      val end = System.currentTimeMillis()
      println(s"mode: $mode count: $count cost: ${end - st}")
    }

The same number of k is stored here, and the length of v is double and double + long respectively. Look at the different parsing time-consuming (it is inevitable that less precision parsing is faster):

count \ mode(ms)Low PrecisionHigh Precision
10000189694
1000005621439

It can be seen that under the condition of choosing a certain accuracy, the operation efficiency can be improved several times. Therefore, when Jason stores numbers, the accuracy of numbers should also be considered.

IV summary

Jason uses a wide range of scenarios. Even small detail modifications may improve the efficiency of the program~

Keywords: Scala JSON

Added by gimzo on Thu, 27 Jan 2022 02:34:03 +0200