Comparing two data files & Writing an Outfile

Discussion in 'Perl' started by filter, Jun 13, 2011.

  1. filter

    filter New Member

    Joined:
    Aug 4, 2007
    Messages:
    1
    Likes Received:
    0
    Trophy Points:
    0
    Hi Everyone,

    I have two files i.e. one file2 contains today's data and the other file1 contains Yesterday's data.

    The data in the files contains 226 columns and the data for the coulums separated by a Pipe "|" delimiter.

    Now, I have 4 Primary keys (coulumns) by which I have to compare file2 and file1 and generate a diff file that is the data that is not present in file2 but present in file1 and vice versa.

    The sample of the data that contains in file1 is ( 1 record):

    Code:
    START-OF-FILE
    FILENAME=fixedincome_bo_namr.dif
    DATA=bo
    REGION=namr
    TYPE=dif
    PROGRAMNAME=getdata
    DATEFORMAT=yyyymmdd
    
    START-OF-FIELDS
    # Security description
    TICKER
    CPN
    MATURITY
    SERIES
    NAME
    SHORT_NAME
    ISSUER_INDUSTRY
    MARKET_SECTOR_DES
    CPN_FREQ
    CPN_TYP
    MTY_TYP
    CALC_TYP_DES
    DAY_CNT
    MARKET_ISSUE
    COUNTRY
    CRNCY
    COLLAT_TYP
    AMT_ISSUED
    AMT_OUTSTANDING
    MIN_PIECE
    MIN_INCREMENT
    PAR_AMT
    LEAD_MGR
    EXCH_CODE
    REDEMP_VAL
    
    # Issuance information
    ANNOUNCE_DT
    FIRST_SETTLE_DT
    FIRST_CPN_DT
    INT_ACC_DT
    ISSUE_DT
    ISSUE_PX
    
    # Identifiers
    ID_EUROCLEAR
    ID_XTRAKTER
    ID_SEDOL1
    ID_SEDOL2
    ID_CEDEL
    ID_WERTPAPIER
    ID_ISIN
    ID_DUTCH
    ID_VALOREN
    ID_FRENCH
    ID_COMMON
    ID_JAPAN
    ID_BELGIUM
    ID_DANISH
    ID_AUSTRIAN
    ID_LUXEMBOURG
    ID_SWEDISH
    ID_NORWAY
    ID_JAPAN_COMPANY
    ID_SPAIN
    ID_ITALY
    ID_BB_COMPANY
    ID_BB_SECURITY
    ID_CUSIP
    
    # Schedules
    NXT_CALL_DT
    NXT_CALL_PX
    NXT_PAR_CALL_DT
    NXT_PUT_DT
    NXT_PUT_PX
    NXT_PAR_PUT_DT
    NXT_CPN_DT
    NXT_SINK_DT
    NXT_REFUND_DT
    
    # Ratings
    RTG_JCR
    
    # Floaters
    REFIX_FREQ
    NXT_REFIX_DT
    RESET_IDX
    
    # Inflation-indexed bonds
    
    # Preferreds
    PFD_DVD_PAY_DT
    PFD_EX_DVD_DT
    PFD_RST_DVD
    
    # Convertibles
    CV_COMMON_TICKER
    CV_COMMON_TICKER_EXCH
    CV_CNVS_RATIO
    CV_UNTIL
    CV_CNVS_FEXCH_RT
    CV_PROV_PX
    
    CALLABLE
    SINKABLE
    PUTABLE
    
    ID_BB_PARENT_CO
    PARENT_COMP_NAME
    PARENT_COMP_TICKER
    CNTRY_OF_INCORPORATION
    BASIC_SPREAD
    INDUSTRY_SECTOR
    INDUSTRY_GROUP
    INDUSTRY_SUBGROUP
    COUNTRY_GUARANTOR
    CNTRY_OF_DOMICILE
    SECURITY_DES
    144A_FLAG
    FLT_BENCH_MULTIPLIER
    SECURITY_TYP
    CV_START_DT
    GUARANTOR_NAME
    PREV_CPN_DT
    NXT_SINK_AMT
    CALL_DISCRETE
    PUT_DISCRETE
    MAKE_WHOLE_CALL
    ID_BB_UNIQUE
    LONG_COMP_NAME
    REDEMP_CRNCY
    CPN_CRNCY
    DTC_ELIGIBLE
    STRUCTURED_NOTE
    PCT_PAR_QUOTED
    PCS_QUOTE_TYP
    IS_UNIT_TRADED
    IS_REVERSE_CONVERTIBLE
    TRADE_CRNCY
    BEARER
    REGISTERED
    CALLED
    CALLED_DT
    ISSUER
    CALL_FEATURE
    PUT_FEATURE
    PENULTIMATE_CPN_DT
    FLT_CPN_CONVENTION
    CUR_CPN
    FLOATER
    TRADE_STATUS
    CDR_COUNTRY_CODE
    CDR_SETTLE_CODE
    SEASONING_STATUS
    FINAL_MATURITY
    PRVT_PLACE
    CALC_TYP
    REMOVAL_REASON
    IS_PERPETUAL
    IS_REG_S
    CALLED_PX
    DEFAULTED
    GILTS_EX_DVD_DT
    MOST_RECENT_REPORTED_FACTOR
    NXT_FACTOR_DT
    OID_BOND
    DELIVERY_TYP
    ID_SEDOL3
    ID_SEDOL4
    ID_SEDOL5
    SEDOL1_COUNTRY_ISO
    SEDOL2_COUNTRY_ISO
    SEDOL3_COUNTRY_ISO
    SEDOL4_COUNTRY_ISO
    SEDOL5_COUNTRY_ISO
    ID_MIC1
    ID_MIC2
    ID_MIC3
    ID_MIC4
    ID_MIC5
    CV_SH_PAR
    DUAL_CRNCY
    EXTENDIBLE
    EXCHANGEABLE
    IS_SOFT_CALL
    CV_MANDATORY_CNVS
    EU_SAVINGS_DIRECTIVE
    ID_CUSIP_REAL
    INDUSTRY_SUBGROUP_NUM
    SECURITY_TYP2
    LAST_REFIX_DT
    ISO_COUNTRY_GUARANTOR
    DTC_REGISTERED
    CALL_PARTIAL
    CV_CNVS_PX
    IS_CURRENT_GOVT
    FIRST_CALL_DT_ISSUANCE
    UNDL_ID_BB_UNIQUE
    EST_CPN_FLAG
    ID_BB_GUARANTOR
    IS_DAY_PAYER
    STEPUP_CPN
    STEPUP_DT
    CALC_MATURITY
    ID_EXCH_SYMBOL
    CREDIT_ENHANCEMENTS
    INSURANCE_STATUS
    JUNIOR
    SENIOR
    FLT_PAY_DAY
    FLT_DAYS_PRIOR
    INFLATION_LINKED_INDICATOR
    DAYS_TO_SETTLE
    TYPE_OF_BOND
    REFERENCE_INDEX
    BASE_CPI
    CFI_CODE
    CPN_FREQ_YLD_CNV
    DAY_PAYER_FREQ
    EX_DIV_DAYS
    EX_DIV_CALENDAR
    CONTINGENT_CONVERSION
    CONTRIB_DATA_INDICATOR
    SECURITY_FACTORABLE
    ID_BB_GLOBAL
    END-OF-FIELDS
    
    TIMESTARTED=Wed Jun  8 18:33:51 EDT 2011
      230 START-OF-DATA
      231 CP5101987 Corp|-1|198|C|7.300000|20110607|EAB|EUROPEAN AMERICAN BANK|LNB-CALL12/97|BANK|Corp|2|STEP CPN|CALLABLE|MULTI-STEP BOND|1|US DOMESTIC|US|USD|DE      POSIT NOTES|760000.00|.00|10000.0000|1000.0000|1000.00|LAS-sole|NOT LISTED|100.00000|19960607|19960607|19961207|19960607|19960607|100.000000| | | | | |       |US29874AZZ55| | | | | | | | | | | | | | |225433|500231|29874AZZ5| | | | | | | | | |N.A.| | | | | | | | | | | | |Y|N|N|197879|Citibank NA|8156Z|US| |Fin      ancial|Banks|Money Center Banks|N.A.|US|C 7.3 06/07/11|N| |US DOMESTIC| |N.A.| | |Y| |N|COCP5101987|European American Bank|USD|USD|Y|Y|Y|1|N|N|USD|N|N|Y      |19971207|EUROPEAN AMERICAN BANK|Semi-Annual| |19970607| | |N|N|US|US|Does Not Apply|20110607|N|421|MATURED|N|N|100.000000|N| |.000000000| |N|DTC| | | |      N.A.|N.A.|N.A.|N.A.|N.A.| | | | | | |N|N|N|N| |Grandfathered|29874AZZ5|145| | |N.A.|N|FULL (ONLY)| |N|19970607| | | |N| | |20110607| | |N|N|N| | |N|3| |       | |N.A.|2| | | | |N|N|BBG00048KQJ7|
    
    
    The sample of the data that contains in file2 (Todays file) is ( 1 record):

    Code:
    START-OF-FILE
    FILENAME=fixedincome_bo_namr.dif
    DATA=bo
    REGION=namr
    TYPE=dif
    PROGRAMNAME=getdata
    DATEFORMAT=yyyymmdd
    
    START-OF-FIELDS
    # Security description
    TICKER
    CPN
    MATURITY
    SERIES
    NAME
    SHORT_NAME
    ISSUER_INDUSTRY
    MARKET_SECTOR_DES
    CPN_FREQ
    CPN_TYP
    MTY_TYP
    CALC_TYP_DES
    DAY_CNT
    MARKET_ISSUE
    COUNTRY
    CRNCY
    COLLAT_TYP
    AMT_ISSUED
    AMT_OUTSTANDING
    MIN_PIECE
    MIN_INCREMENT
    PAR_AMT
    LEAD_MGR
    EXCH_CODE
    REDEMP_VAL
    
    # Issuance information
    ANNOUNCE_DT
    FIRST_SETTLE_DT
    FIRST_CPN_DT
    INT_ACC_DT
    ISSUE_DT
    ISSUE_PX
    
    # Identifiers
    ID_EUROCLEAR
    ID_XTRAKTER
    ID_SEDOL1
    ID_SEDOL2
    ID_CEDEL
    ID_WERTPAPIER
    ID_ISIN
    ID_DUTCH
    ID_VALOREN
    ID_FRENCH
    ID_COMMON
    ID_JAPAN
    ID_BELGIUM
    ID_DANISH
    ID_AUSTRIAN
    ID_LUXEMBOURG
    ID_SWEDISH
    ID_NORWAY
    ID_JAPAN_COMPANY
    ID_SPAIN
    ID_ITALY
    ID_BB_COMPANY
    ID_BB_SECURITY
    ID_CUSIP
    
    # Schedules
    NXT_CALL_DT
    NXT_CALL_PX
    NXT_PAR_CALL_DT
    NXT_PUT_DT
    NXT_PUT_PX
    NXT_PAR_PUT_DT
    NXT_CPN_DT
    NXT_SINK_DT
    NXT_REFUND_DT
    
    # Ratings
    RTG_JCR
    
    # Floaters
    REFIX_FREQ
    NXT_REFIX_DT
    RESET_IDX
    
    # Inflation-indexed bonds
    
    # Preferreds
    PFD_DVD_PAY_DT
    PFD_EX_DVD_DT
    PFD_RST_DVD
    
    # Convertibles
    CV_COMMON_TICKER
    CV_COMMON_TICKER_EXCH
    CV_CNVS_RATIO
    CV_UNTIL
    CV_CNVS_FEXCH_RT
    CV_PROV_PX
    
    CALLABLE
    SINKABLE
    PUTABLE
    
    ID_BB_PARENT_CO
    PARENT_COMP_NAME
    PARENT_COMP_TICKER
    CNTRY_OF_INCORPORATION
    BASIC_SPREAD
    INDUSTRY_SECTOR
    INDUSTRY_GROUP
    INDUSTRY_SUBGROUP
    COUNTRY_GUARANTOR
    CNTRY_OF_DOMICILE
    SECURITY_DES
    144A_FLAG
    FLT_BENCH_MULTIPLIER
    SECURITY_TYP
    CV_START_DT
    GUARANTOR_NAME
    PREV_CPN_DT
    NXT_SINK_AMT
    CALL_DISCRETE
    PUT_DISCRETE
    MAKE_WHOLE_CALL
    ID_BB_UNIQUE
    LONG_COMP_NAME
    REDEMP_CRNCY
    CPN_CRNCY
    DTC_ELIGIBLE
    STRUCTURED_NOTE
    PCT_PAR_QUOTED
    PCS_QUOTE_TYP
    IS_UNIT_TRADED
    IS_REVERSE_CONVERTIBLE
    TRADE_CRNCY
    BEARER
    REGISTERED
    CALLED
    CALLED_DT
    ISSUER
    CALL_FEATURE
    PUT_FEATURE
    PENULTIMATE_CPN_DT
    FLT_CPN_CONVENTION
    CUR_CPN
    FLOATER
    TRADE_STATUS
    CDR_COUNTRY_CODE
    CDR_SETTLE_CODE
    SEASONING_STATUS
    FINAL_MATURITY
    PRVT_PLACE
    CALC_TYP
    REMOVAL_REASON
    IS_PERPETUAL
    IS_REG_S
    CALLED_PX
    DEFAULTED
    GILTS_EX_DVD_DT
    MOST_RECENT_REPORTED_FACTOR
    NXT_FACTOR_DT
    OID_BOND
    DELIVERY_TYP
    ID_SEDOL3
    ID_SEDOL4
    ID_SEDOL5
    SEDOL1_COUNTRY_ISO
    SEDOL2_COUNTRY_ISO
    SEDOL3_COUNTRY_ISO
    SEDOL4_COUNTRY_ISO
    SEDOL5_COUNTRY_ISO
    ID_MIC1
    ID_MIC2
    ID_MIC3
    ID_MIC4
    ID_MIC5
    CV_SH_PAR
    DUAL_CRNCY
    EXTENDIBLE
    EXCHANGEABLE
    IS_SOFT_CALL
    CV_MANDATORY_CNVS
    EU_SAVINGS_DIRECTIVE
    ID_CUSIP_REAL
    INDUSTRY_SUBGROUP_NUM
    SECURITY_TYP2
    LAST_REFIX_DT
    ISO_COUNTRY_GUARANTOR
    DTC_REGISTERED
    CALL_PARTIAL
    CV_CNVS_PX
    IS_CURRENT_GOVT
    FIRST_CALL_DT_ISSUANCE
    UNDL_ID_BB_UNIQUE
    EST_CPN_FLAG
    ID_BB_GUARANTOR
    IS_DAY_PAYER
    STEPUP_CPN
    STEPUP_DT
    CALC_MATURITY
    ID_EXCH_SYMBOL
    CREDIT_ENHANCEMENTS
    INSURANCE_STATUS
    JUNIOR
    SENIOR
    FLT_PAY_DAY
    FLT_DAYS_PRIOR
    INFLATION_LINKED_INDICATOR
    DAYS_TO_SETTLE
    TYPE_OF_BOND
    REFERENCE_INDEX
    BASE_CPI
    CFI_CODE
    CPN_FREQ_YLD_CNV
    DAY_PAYER_FREQ
    EX_DIV_DAYS
    EX_DIV_CALENDAR
    CONTINGENT_CONVERSION
    CONTRIB_DATA_INDICATOR
    SECURITY_FACTORABLE
    ID_BB_GLOBAL
    END-OF-FIELDS
    
     TIMESTARTED=Thu Jun  9 18:34:19 EDT 2011
     230 START-OF-DATA
     231 9999X01M9 Govt|-1|198|XIB|0|20110707| |WI TSY BILL|WI TSY BILL|USGN|Govt| |NONE|NORMAL|DISCOUNT|2|US GOVT|US|USD| |31782000000|31782000000|100|100| | | |     100.00000| | |20110106| |20110609|0.005000| | | | | | | | | | | | | | | | | | | | | |349057|13714872|9127952X8| | | | | | | | | | | | | | | | | | | | | |      |N|N|N|218252|United States of America|3352Z|US| |Government|Sovereign|Sovereign| |US|XIB 07/07/11| | |US GOVERNMENT| | | | | | |N.A.|GV9999X01M9|United      States Treasury Bill - WI Post Auction|USD|USD| | |Y|2|N| |USD|N.A.|N.A.|N| |WI TSY BILL| | | | | |N|Y|US|US| |20110707|N.A.|5| | | | | | | | |Y| | | |      |N.A.|N.A.|N.A.|N.A.|N.A.| | | | | | | | |N.A.|N| |Non-Grandfathered|9999X01M9|459|Bill| | | | | |N| | | | |N| | |20110707| | | | | | | |N|1| | | | | | |      | | | |N|BBG001CSH9Y7|
    
    
    

    Since I have 4 primary keys, I though of doing the code using Hashes.

    Could someone please help me out. I am looking for a desperate help on this.

    Really appreciated if someone could help out with some ideas.
     

Share This Page

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice