Joining CSV files in UbuntuIs there a way to do an inner join in Excel between 2 csv files?Emails in CSV B...

Need help identifying/translating a plaque in Tangier, Morocco

New order #4: World

How is it possible for user's password to be changed after storage was encrypted? (on OS X, Android)

Finding files for which a command fails

Doomsday-clock for my fantasy planet

Does it makes sense to buy a new cycle to learn riding?

Ideas for 3rd eye abilities

What is GPS' 19 year rollover and does it present a cybersecurity issue?

Domain expired, GoDaddy holds it and is asking more money

How to make payment on the internet without leaving a money trail?

How would photo IDs work for shapeshifters?

"My colleague's body is amazing"

Manga about a female worker who got dragged into another world together with this high school girl and she was just told she's not needed anymore

What do you call something that goes against the spirit of the law, but is legal when interpreting the law to the letter?

Are objects structures and/or vice versa?

How to move the player while also allowing forces to affect it

Lied on resume at previous job

Patience, young "Padovan"

What is the offset in a seaplane's hull?

Copycat chess is back

aging parents with no investments

Is domain driven design an anti-SQL pattern?

Is this food a bread or a loaf?

Email Account under attack (really) - anything I can do?



Joining CSV files in Ubuntu


Is there a way to do an inner join in Excel between 2 csv files?Emails in CSV B that are not in CSV AEditing CSV files in UbuntuSplit view of a CSV in BashHow to split CSV files as per number of rows specified?csv file to phpmyadmin import errorExcel: How to skip specific rows when importing a CSVImporting multiple CSV filesCSV Input for Merged Columns in Microsoft ExcelExport a csv file to multiple csv files with batch command






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







0















I'd like to join csv files in Ubuntu.



file_A.csv:
ID_a, ID_b, a, b, c
key_a, A, a1, b1, c1
key_a, B, a2, b2, c2
key_b, A, a3, b3, c3

file_B.csv:
ID_a, ID_b, d, e, f
key_a, A, d1, e1, f1
key_a, B, d2, e2, f2
key_b, A, d3, e3, f3

join_AB.csv
ID_a, ID_b, a, b, c, d, e, f
key_a, A, a1, b1, c1, d1, e1, f1
key_a, B, a2, b2, c2, d2, e2, f2
key_b, A, a3, b3, c3, d3, e3, f3


The input CSV files should be joined on common columns in their header. Is there a stock solution to this, or should I write my own script to do it?










share|improve this question














bumped to the homepage by Community 2 days ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
















  • Duplicate: stackoverflow.com/questions/2619562/…

    – jmetz
    Jul 25 '12 at 16:12


















0















I'd like to join csv files in Ubuntu.



file_A.csv:
ID_a, ID_b, a, b, c
key_a, A, a1, b1, c1
key_a, B, a2, b2, c2
key_b, A, a3, b3, c3

file_B.csv:
ID_a, ID_b, d, e, f
key_a, A, d1, e1, f1
key_a, B, d2, e2, f2
key_b, A, d3, e3, f3

join_AB.csv
ID_a, ID_b, a, b, c, d, e, f
key_a, A, a1, b1, c1, d1, e1, f1
key_a, B, a2, b2, c2, d2, e2, f2
key_b, A, a3, b3, c3, d3, e3, f3


The input CSV files should be joined on common columns in their header. Is there a stock solution to this, or should I write my own script to do it?










share|improve this question














bumped to the homepage by Community 2 days ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
















  • Duplicate: stackoverflow.com/questions/2619562/…

    – jmetz
    Jul 25 '12 at 16:12














0












0








0








I'd like to join csv files in Ubuntu.



file_A.csv:
ID_a, ID_b, a, b, c
key_a, A, a1, b1, c1
key_a, B, a2, b2, c2
key_b, A, a3, b3, c3

file_B.csv:
ID_a, ID_b, d, e, f
key_a, A, d1, e1, f1
key_a, B, d2, e2, f2
key_b, A, d3, e3, f3

join_AB.csv
ID_a, ID_b, a, b, c, d, e, f
key_a, A, a1, b1, c1, d1, e1, f1
key_a, B, a2, b2, c2, d2, e2, f2
key_b, A, a3, b3, c3, d3, e3, f3


The input CSV files should be joined on common columns in their header. Is there a stock solution to this, or should I write my own script to do it?










share|improve this question














I'd like to join csv files in Ubuntu.



file_A.csv:
ID_a, ID_b, a, b, c
key_a, A, a1, b1, c1
key_a, B, a2, b2, c2
key_b, A, a3, b3, c3

file_B.csv:
ID_a, ID_b, d, e, f
key_a, A, d1, e1, f1
key_a, B, d2, e2, f2
key_b, A, d3, e3, f3

join_AB.csv
ID_a, ID_b, a, b, c, d, e, f
key_a, A, a1, b1, c1, d1, e1, f1
key_a, B, a2, b2, c2, d2, e2, f2
key_b, A, a3, b3, c3, d3, e3, f3


The input CSV files should be joined on common columns in their header. Is there a stock solution to this, or should I write my own script to do it?







ubuntu csv






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Jul 25 '12 at 14:58









Andrew WoodAndrew Wood

77921020




77921020





bumped to the homepage by Community 2 days ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







bumped to the homepage by Community 2 days ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.















  • Duplicate: stackoverflow.com/questions/2619562/…

    – jmetz
    Jul 25 '12 at 16:12



















  • Duplicate: stackoverflow.com/questions/2619562/…

    – jmetz
    Jul 25 '12 at 16:12

















Duplicate: stackoverflow.com/questions/2619562/…

– jmetz
Jul 25 '12 at 16:12





Duplicate: stackoverflow.com/questions/2619562/…

– jmetz
Jul 25 '12 at 16:12










2 Answers
2






active

oldest

votes


















0














Try the join command:




NAME
join - join lines of two files on a common field



SYNOPSIS
join [OPTION]... FILE1 FILE2



DESCRIPTION
For each pair of input lines with identical join fields, write a line
to standard output. The default join field is the first, delimited by
whitespace. When FILE1 or FILE2 (not both) is -, read standard input.




So you should be able to do:



join file_A.csv file_B.csv > file_AB.csv


You may have to join your first and second fields into one for this to work though - as in essence they can be seen as one field anyway.



I just double checked and it seems to work as long as your files have the format e.g.:



file_A.csv
ID_aID_b, a, b, c
key_aA, a1, b1, c1
key_aB, a2, b2, c2
key_bA, a3, b3, c3


as I mentioned above.






share|improve this answer


























  • I don't think this'll work for me. I'd need to do scripting to merge then split the ID columns, so is would be just as easy to script the join.

    – Andrew Wood
    Jul 25 '12 at 15:44











  • @ajwood: That's unfortunate - in that case some amount of scripting will likely be needed.

    – jmetz
    Jul 25 '12 at 16:04













  • @ajwood - see my comment on the question itself - there is a very similar question already posted on stackoverflow.

    – jmetz
    Jul 25 '12 at 16:17



















0














Here is my solution in Python



import sys
import csv

def main(args):
# store each header we read
headers = []

# Intersect headers to get our keys
for arg in args:
with open(arg) as f:
curr = csv.reader(f).next()
headers.append(curr)
try:
keys = list( set(keys) & set(curr) )
except NameError:
keys = curr

# New header
header = list(keys)
for h in headers:
header += [ k for k in h if k not in keys ]

# Join data
data = {}
for arg in args:
with open(arg) as f:
reader = csv.DictReader(f)
for line in reader:
data_key = tuple([ line[k] for k in keys ])
if not data_key in data: data[data_key] = {}
for k in header:
try:
data[data_key][k] = line[k]
except KeyError:
pass

# Drop keys that are missing data (keys not present in all files)
for key in data.keys():
for col in header:
if key in data and not col in data[key]:
del( data[key] )

# Dump data
print ','.join(header)
for key in sorted(data):
row = [ data[key][col] for col in header ]
print ','.join(row)


if __name__ == '__main__':
sys.exit( main( sys.argv[1:]) )





share|improve this answer
























    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "3"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f453440%2fjoining-csv-files-in-ubuntu%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    Try the join command:




    NAME
    join - join lines of two files on a common field



    SYNOPSIS
    join [OPTION]... FILE1 FILE2



    DESCRIPTION
    For each pair of input lines with identical join fields, write a line
    to standard output. The default join field is the first, delimited by
    whitespace. When FILE1 or FILE2 (not both) is -, read standard input.




    So you should be able to do:



    join file_A.csv file_B.csv > file_AB.csv


    You may have to join your first and second fields into one for this to work though - as in essence they can be seen as one field anyway.



    I just double checked and it seems to work as long as your files have the format e.g.:



    file_A.csv
    ID_aID_b, a, b, c
    key_aA, a1, b1, c1
    key_aB, a2, b2, c2
    key_bA, a3, b3, c3


    as I mentioned above.






    share|improve this answer


























    • I don't think this'll work for me. I'd need to do scripting to merge then split the ID columns, so is would be just as easy to script the join.

      – Andrew Wood
      Jul 25 '12 at 15:44











    • @ajwood: That's unfortunate - in that case some amount of scripting will likely be needed.

      – jmetz
      Jul 25 '12 at 16:04













    • @ajwood - see my comment on the question itself - there is a very similar question already posted on stackoverflow.

      – jmetz
      Jul 25 '12 at 16:17
















    0














    Try the join command:




    NAME
    join - join lines of two files on a common field



    SYNOPSIS
    join [OPTION]... FILE1 FILE2



    DESCRIPTION
    For each pair of input lines with identical join fields, write a line
    to standard output. The default join field is the first, delimited by
    whitespace. When FILE1 or FILE2 (not both) is -, read standard input.




    So you should be able to do:



    join file_A.csv file_B.csv > file_AB.csv


    You may have to join your first and second fields into one for this to work though - as in essence they can be seen as one field anyway.



    I just double checked and it seems to work as long as your files have the format e.g.:



    file_A.csv
    ID_aID_b, a, b, c
    key_aA, a1, b1, c1
    key_aB, a2, b2, c2
    key_bA, a3, b3, c3


    as I mentioned above.






    share|improve this answer


























    • I don't think this'll work for me. I'd need to do scripting to merge then split the ID columns, so is would be just as easy to script the join.

      – Andrew Wood
      Jul 25 '12 at 15:44











    • @ajwood: That's unfortunate - in that case some amount of scripting will likely be needed.

      – jmetz
      Jul 25 '12 at 16:04













    • @ajwood - see my comment on the question itself - there is a very similar question already posted on stackoverflow.

      – jmetz
      Jul 25 '12 at 16:17














    0












    0








    0







    Try the join command:




    NAME
    join - join lines of two files on a common field



    SYNOPSIS
    join [OPTION]... FILE1 FILE2



    DESCRIPTION
    For each pair of input lines with identical join fields, write a line
    to standard output. The default join field is the first, delimited by
    whitespace. When FILE1 or FILE2 (not both) is -, read standard input.




    So you should be able to do:



    join file_A.csv file_B.csv > file_AB.csv


    You may have to join your first and second fields into one for this to work though - as in essence they can be seen as one field anyway.



    I just double checked and it seems to work as long as your files have the format e.g.:



    file_A.csv
    ID_aID_b, a, b, c
    key_aA, a1, b1, c1
    key_aB, a2, b2, c2
    key_bA, a3, b3, c3


    as I mentioned above.






    share|improve this answer















    Try the join command:




    NAME
    join - join lines of two files on a common field



    SYNOPSIS
    join [OPTION]... FILE1 FILE2



    DESCRIPTION
    For each pair of input lines with identical join fields, write a line
    to standard output. The default join field is the first, delimited by
    whitespace. When FILE1 or FILE2 (not both) is -, read standard input.




    So you should be able to do:



    join file_A.csv file_B.csv > file_AB.csv


    You may have to join your first and second fields into one for this to work though - as in essence they can be seen as one field anyway.



    I just double checked and it seems to work as long as your files have the format e.g.:



    file_A.csv
    ID_aID_b, a, b, c
    key_aA, a1, b1, c1
    key_aB, a2, b2, c2
    key_bA, a3, b3, c3


    as I mentioned above.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Jul 25 '12 at 15:13

























    answered Jul 25 '12 at 15:05









    jmetzjmetz

    79237




    79237













    • I don't think this'll work for me. I'd need to do scripting to merge then split the ID columns, so is would be just as easy to script the join.

      – Andrew Wood
      Jul 25 '12 at 15:44











    • @ajwood: That's unfortunate - in that case some amount of scripting will likely be needed.

      – jmetz
      Jul 25 '12 at 16:04













    • @ajwood - see my comment on the question itself - there is a very similar question already posted on stackoverflow.

      – jmetz
      Jul 25 '12 at 16:17



















    • I don't think this'll work for me. I'd need to do scripting to merge then split the ID columns, so is would be just as easy to script the join.

      – Andrew Wood
      Jul 25 '12 at 15:44











    • @ajwood: That's unfortunate - in that case some amount of scripting will likely be needed.

      – jmetz
      Jul 25 '12 at 16:04













    • @ajwood - see my comment on the question itself - there is a very similar question already posted on stackoverflow.

      – jmetz
      Jul 25 '12 at 16:17

















    I don't think this'll work for me. I'd need to do scripting to merge then split the ID columns, so is would be just as easy to script the join.

    – Andrew Wood
    Jul 25 '12 at 15:44





    I don't think this'll work for me. I'd need to do scripting to merge then split the ID columns, so is would be just as easy to script the join.

    – Andrew Wood
    Jul 25 '12 at 15:44













    @ajwood: That's unfortunate - in that case some amount of scripting will likely be needed.

    – jmetz
    Jul 25 '12 at 16:04







    @ajwood: That's unfortunate - in that case some amount of scripting will likely be needed.

    – jmetz
    Jul 25 '12 at 16:04















    @ajwood - see my comment on the question itself - there is a very similar question already posted on stackoverflow.

    – jmetz
    Jul 25 '12 at 16:17





    @ajwood - see my comment on the question itself - there is a very similar question already posted on stackoverflow.

    – jmetz
    Jul 25 '12 at 16:17













    0














    Here is my solution in Python



    import sys
    import csv

    def main(args):
    # store each header we read
    headers = []

    # Intersect headers to get our keys
    for arg in args:
    with open(arg) as f:
    curr = csv.reader(f).next()
    headers.append(curr)
    try:
    keys = list( set(keys) & set(curr) )
    except NameError:
    keys = curr

    # New header
    header = list(keys)
    for h in headers:
    header += [ k for k in h if k not in keys ]

    # Join data
    data = {}
    for arg in args:
    with open(arg) as f:
    reader = csv.DictReader(f)
    for line in reader:
    data_key = tuple([ line[k] for k in keys ])
    if not data_key in data: data[data_key] = {}
    for k in header:
    try:
    data[data_key][k] = line[k]
    except KeyError:
    pass

    # Drop keys that are missing data (keys not present in all files)
    for key in data.keys():
    for col in header:
    if key in data and not col in data[key]:
    del( data[key] )

    # Dump data
    print ','.join(header)
    for key in sorted(data):
    row = [ data[key][col] for col in header ]
    print ','.join(row)


    if __name__ == '__main__':
    sys.exit( main( sys.argv[1:]) )





    share|improve this answer




























      0














      Here is my solution in Python



      import sys
      import csv

      def main(args):
      # store each header we read
      headers = []

      # Intersect headers to get our keys
      for arg in args:
      with open(arg) as f:
      curr = csv.reader(f).next()
      headers.append(curr)
      try:
      keys = list( set(keys) & set(curr) )
      except NameError:
      keys = curr

      # New header
      header = list(keys)
      for h in headers:
      header += [ k for k in h if k not in keys ]

      # Join data
      data = {}
      for arg in args:
      with open(arg) as f:
      reader = csv.DictReader(f)
      for line in reader:
      data_key = tuple([ line[k] for k in keys ])
      if not data_key in data: data[data_key] = {}
      for k in header:
      try:
      data[data_key][k] = line[k]
      except KeyError:
      pass

      # Drop keys that are missing data (keys not present in all files)
      for key in data.keys():
      for col in header:
      if key in data and not col in data[key]:
      del( data[key] )

      # Dump data
      print ','.join(header)
      for key in sorted(data):
      row = [ data[key][col] for col in header ]
      print ','.join(row)


      if __name__ == '__main__':
      sys.exit( main( sys.argv[1:]) )





      share|improve this answer


























        0












        0








        0







        Here is my solution in Python



        import sys
        import csv

        def main(args):
        # store each header we read
        headers = []

        # Intersect headers to get our keys
        for arg in args:
        with open(arg) as f:
        curr = csv.reader(f).next()
        headers.append(curr)
        try:
        keys = list( set(keys) & set(curr) )
        except NameError:
        keys = curr

        # New header
        header = list(keys)
        for h in headers:
        header += [ k for k in h if k not in keys ]

        # Join data
        data = {}
        for arg in args:
        with open(arg) as f:
        reader = csv.DictReader(f)
        for line in reader:
        data_key = tuple([ line[k] for k in keys ])
        if not data_key in data: data[data_key] = {}
        for k in header:
        try:
        data[data_key][k] = line[k]
        except KeyError:
        pass

        # Drop keys that are missing data (keys not present in all files)
        for key in data.keys():
        for col in header:
        if key in data and not col in data[key]:
        del( data[key] )

        # Dump data
        print ','.join(header)
        for key in sorted(data):
        row = [ data[key][col] for col in header ]
        print ','.join(row)


        if __name__ == '__main__':
        sys.exit( main( sys.argv[1:]) )





        share|improve this answer













        Here is my solution in Python



        import sys
        import csv

        def main(args):
        # store each header we read
        headers = []

        # Intersect headers to get our keys
        for arg in args:
        with open(arg) as f:
        curr = csv.reader(f).next()
        headers.append(curr)
        try:
        keys = list( set(keys) & set(curr) )
        except NameError:
        keys = curr

        # New header
        header = list(keys)
        for h in headers:
        header += [ k for k in h if k not in keys ]

        # Join data
        data = {}
        for arg in args:
        with open(arg) as f:
        reader = csv.DictReader(f)
        for line in reader:
        data_key = tuple([ line[k] for k in keys ])
        if not data_key in data: data[data_key] = {}
        for k in header:
        try:
        data[data_key][k] = line[k]
        except KeyError:
        pass

        # Drop keys that are missing data (keys not present in all files)
        for key in data.keys():
        for col in header:
        if key in data and not col in data[key]:
        del( data[key] )

        # Dump data
        print ','.join(header)
        for key in sorted(data):
        row = [ data[key][col] for col in header ]
        print ','.join(row)


        if __name__ == '__main__':
        sys.exit( main( sys.argv[1:]) )






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jul 25 '12 at 20:56









        Andrew WoodAndrew Wood

        77921020




        77921020






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Super User!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f453440%2fjoining-csv-files-in-ubuntu%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Cannot install PyQt5 The Next CEO of Stack OverflowCannot install tcpreplay 3.4.4cannot...

            Kapp-Putsch Acontecimentos | Outros artigos | Menu de navegação

            Why did early computer designers eschew integers? The Next CEO of Stack OverflowWhat register...