Joining CSV files in UbuntuIs there a way to do an inner join in Excel between 2 csv files?Emails in CSV B...

Need help identifying/translating a plaque in Tangier, Morocco

New order #4: World

How is it possible for user's password to be changed after storage was encrypted? (on OS X, Android)

Finding files for which a command fails

Doomsday-clock for my fantasy planet

Does it makes sense to buy a new cycle to learn riding?

Ideas for 3rd eye abilities

What is GPS' 19 year rollover and does it present a cybersecurity issue?

Domain expired, GoDaddy holds it and is asking more money

How to make payment on the internet without leaving a money trail?

How would photo IDs work for shapeshifters?

"My colleague's body is amazing"

Manga about a female worker who got dragged into another world together with this high school girl and she was just told she's not needed anymore

What do you call something that goes against the spirit of the law, but is legal when interpreting the law to the letter?

Are objects structures and/or vice versa?

How to move the player while also allowing forces to affect it

Lied on resume at previous job

Patience, young "Padovan"

What is the offset in a seaplane's hull?

Copycat chess is back

aging parents with no investments

Is domain driven design an anti-SQL pattern?

Is this food a bread or a loaf?

Email Account under attack (really) - anything I can do?

Joining CSV files in Ubuntu

Is there a way to do an inner join in Excel between 2 csv files?Emails in CSV B that are not in CSV AEditing CSV files in UbuntuSplit view of a CSV in BashHow to split CSV files as per number of rows specified?csv file to phpmyadmin import errorExcel: How to skip specific rows when importing a CSVImporting multiple CSV filesCSV Input for Merged Columns in Microsoft ExcelExport a csv file to multiple csv files with batch command

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

I'd like to join csv files in Ubuntu.

file_A.csv:

ID_a, ID_b, a,  b,  c

key_a, A,   a1, b1, c1

key_a, B,   a2, b2, c2

key_b, A,   a3, b3, c3



file_B.csv:

ID_a, ID_b, d,  e,  f

key_a, A,   d1, e1, f1

key_a, B,   d2, e2, f2

key_b, A,   d3, e3, f3



join_AB.csv

ID_a, ID_b, a, b,  c,  d,  e,  f

key_a, A,  a1, b1, c1, d1, e1, f1

key_a, B,  a2, b2, c2, d2, e2, f2

key_b, A,  a3, b3, c3, d3, e3, f3

The input CSV files should be joined on common columns in their header. Is there a stock solution to this, or should I write my own script to do it?

asked Jul 25 '12 at 14:58

Andrew Wood

77921020

bumped to the homepage by Community♦ 2 days ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

Duplicate: stackoverflow.com/questions/2619562/…

– jmetz
Jul 25 '12 at 16:12

add a comment |

I'd like to join csv files in Ubuntu.

file_A.csv:

ID_a, ID_b, a,  b,  c

key_a, A,   a1, b1, c1

key_a, B,   a2, b2, c2

key_b, A,   a3, b3, c3



file_B.csv:

ID_a, ID_b, d,  e,  f

key_a, A,   d1, e1, f1

key_a, B,   d2, e2, f2

key_b, A,   d3, e3, f3



join_AB.csv

ID_a, ID_b, a, b,  c,  d,  e,  f

key_a, A,  a1, b1, c1, d1, e1, f1

key_a, B,  a2, b2, c2, d2, e2, f2

key_b, A,  a3, b3, c3, d3, e3, f3

The input CSV files should be joined on common columns in their header. Is there a stock solution to this, or should I write my own script to do it?

asked Jul 25 '12 at 14:58

Andrew Wood

77921020

bumped to the homepage by Community♦ 2 days ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

Duplicate: stackoverflow.com/questions/2619562/…

– jmetz
Jul 25 '12 at 16:12

add a comment |

I'd like to join csv files in Ubuntu.

file_A.csv:

ID_a, ID_b, a,  b,  c

key_a, A,   a1, b1, c1

key_a, B,   a2, b2, c2

key_b, A,   a3, b3, c3



file_B.csv:

ID_a, ID_b, d,  e,  f

key_a, A,   d1, e1, f1

key_a, B,   d2, e2, f2

key_b, A,   d3, e3, f3



join_AB.csv

ID_a, ID_b, a, b,  c,  d,  e,  f

key_a, A,  a1, b1, c1, d1, e1, f1

key_a, B,  a2, b2, c2, d2, e2, f2

key_b, A,  a3, b3, c3, d3, e3, f3

The input CSV files should be joined on common columns in their header. Is there a stock solution to this, or should I write my own script to do it?

asked Jul 25 '12 at 14:58

Andrew Wood

77921020

I'd like to join csv files in Ubuntu.

file_A.csv:

ID_a, ID_b, a,  b,  c

key_a, A,   a1, b1, c1

key_a, B,   a2, b2, c2

key_b, A,   a3, b3, c3



file_B.csv:

ID_a, ID_b, d,  e,  f

key_a, A,   d1, e1, f1

key_a, B,   d2, e2, f2

key_b, A,   d3, e3, f3



join_AB.csv

ID_a, ID_b, a, b,  c,  d,  e,  f

key_a, A,  a1, b1, c1, d1, e1, f1

key_a, B,  a2, b2, c2, d2, e2, f2

key_b, A,  a3, b3, c3, d3, e3, f3

The input CSV files should be joined on common columns in their header. Is there a stock solution to this, or should I write my own script to do it?

ubuntu csv

asked Jul 25 '12 at 14:58

Andrew Wood

77921020

asked Jul 25 '12 at 14:58

Andrew Wood

77921020

asked Jul 25 '12 at 14:58

Andrew Wood

77921020

asked Jul 25 '12 at 14:58

Andrew Wood

77921020

asked Jul 25 '12 at 14:58

Andrew Wood

77921020

bumped to the homepage by Community♦ 2 days ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

bumped to the homepage by Community♦ 2 days ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

Duplicate: stackoverflow.com/questions/2619562/…

– jmetz
Jul 25 '12 at 16:12

add a comment |

Duplicate: stackoverflow.com/questions/2619562/…

– jmetz
Jul 25 '12 at 16:12

Duplicate: stackoverflow.com/questions/2619562/…

– jmetz
Jul 25 '12 at 16:12

add a comment |

2 Answers
2

active

oldest

votes

Try the join command:

NAME
join - join lines of two files on a common field

SYNOPSIS
join [OPTION]... FILE1 FILE2

DESCRIPTION
For each pair of input lines with identical join fields, write a line
to standard output. The default join field is the first, delimited by
whitespace. When FILE1 or FILE2 (not both) is -, read standard input.

So you should be able to do:

join file_A.csv file_B.csv > file_AB.csv

You may have to join your first and second fields into one for this to work though - as in essence they can be seen as one field anyway.

I just double checked and it seems to work as long as your files have the format e.g.:

file_A.csv

ID_aID_b, a,  b,  c

key_aA,   a1, b1, c1

key_aB,   a2, b2, c2

key_bA,   a3, b3, c3

as I mentioned above.

edited Jul 25 '12 at 15:13

answered Jul 25 '12 at 15:05

jmetz

79237

I don't think this'll work for me. I'd need to do scripting to merge then split the ID columns, so is would be just as easy to script the join.

– Andrew Wood
Jul 25 '12 at 15:44

@ajwood: That's unfortunate - in that case some amount of scripting will likely be needed.

– jmetz
Jul 25 '12 at 16:04

@ajwood - see my comment on the question itself - there is a very similar question already posted on stackoverflow.

– jmetz
Jul 25 '12 at 16:17

add a comment |

Here is my solution in Python

import sys

import csv



def main(args):

    # store each header we read

    headers = []



    # Intersect headers to get our keys

    for arg in args:

        with open(arg) as f:

            curr = csv.reader(f).next()

            headers.append(curr)

            try:

                keys = list( set(keys) & set(curr) )

            except NameError:

                keys = curr



    # New header

    header = list(keys)

    for h in headers:

        header += [ k for k in h if k not in keys ]



    # Join data

    data = {}

    for arg in args:

        with open(arg) as f:

            reader = csv.DictReader(f)

            for line in reader:

                data_key = tuple([ line[k] for k in keys ])

                if not data_key in data: data[data_key] = {}

                for k in header:

                    try:

                        data[data_key][k] = line[k]

                    except KeyError:

                        pass



    # Drop keys that are missing data (keys not present in all files)

    for key in data.keys():

        for col in header:

            if key in data and not col in data[key]:

                del( data[key] )



    # Dump data

    print ','.join(header)

    for key in sorted(data):

        row = [ data[key][col] for col in header ]

        print ','.join(row)





if __name__ == '__main__':

    sys.exit( main( sys.argv[1:]) )

answered Jul 25 '12 at 20:56

Andrew Wood

77921020

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "3"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f453440%2fjoining-csv-files-in-ubuntu%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Try the join command:

NAME
join - join lines of two files on a common field

SYNOPSIS
join [OPTION]... FILE1 FILE2

DESCRIPTION
For each pair of input lines with identical join fields, write a line
to standard output. The default join field is the first, delimited by
whitespace. When FILE1 or FILE2 (not both) is -, read standard input.

So you should be able to do:

join file_A.csv file_B.csv > file_AB.csv

You may have to join your first and second fields into one for this to work though - as in essence they can be seen as one field anyway.

I just double checked and it seems to work as long as your files have the format e.g.:

file_A.csv

ID_aID_b, a,  b,  c

key_aA,   a1, b1, c1

key_aB,   a2, b2, c2

key_bA,   a3, b3, c3

as I mentioned above.

edited Jul 25 '12 at 15:13

answered Jul 25 '12 at 15:05

jmetz

79237

I don't think this'll work for me. I'd need to do scripting to merge then split the ID columns, so is would be just as easy to script the join.

– Andrew Wood
Jul 25 '12 at 15:44

@ajwood: That's unfortunate - in that case some amount of scripting will likely be needed.

– jmetz
Jul 25 '12 at 16:04

@ajwood - see my comment on the question itself - there is a very similar question already posted on stackoverflow.

– jmetz
Jul 25 '12 at 16:17

add a comment |

Try the join command:

NAME
join - join lines of two files on a common field

SYNOPSIS
join [OPTION]... FILE1 FILE2

DESCRIPTION
For each pair of input lines with identical join fields, write a line
to standard output. The default join field is the first, delimited by
whitespace. When FILE1 or FILE2 (not both) is -, read standard input.

So you should be able to do:

join file_A.csv file_B.csv > file_AB.csv

You may have to join your first and second fields into one for this to work though - as in essence they can be seen as one field anyway.

I just double checked and it seems to work as long as your files have the format e.g.:

file_A.csv

ID_aID_b, a,  b,  c

key_aA,   a1, b1, c1

key_aB,   a2, b2, c2

key_bA,   a3, b3, c3

as I mentioned above.

edited Jul 25 '12 at 15:13

answered Jul 25 '12 at 15:05

jmetz

79237

I don't think this'll work for me. I'd need to do scripting to merge then split the ID columns, so is would be just as easy to script the join.

– Andrew Wood
Jul 25 '12 at 15:44

@ajwood: That's unfortunate - in that case some amount of scripting will likely be needed.

– jmetz
Jul 25 '12 at 16:04

@ajwood - see my comment on the question itself - there is a very similar question already posted on stackoverflow.

– jmetz
Jul 25 '12 at 16:17

add a comment |

Try the join command:

NAME
join - join lines of two files on a common field

SYNOPSIS
join [OPTION]... FILE1 FILE2

DESCRIPTION
For each pair of input lines with identical join fields, write a line
to standard output. The default join field is the first, delimited by
whitespace. When FILE1 or FILE2 (not both) is -, read standard input.

So you should be able to do:

join file_A.csv file_B.csv > file_AB.csv

You may have to join your first and second fields into one for this to work though - as in essence they can be seen as one field anyway.

I just double checked and it seems to work as long as your files have the format e.g.:

file_A.csv

ID_aID_b, a,  b,  c

key_aA,   a1, b1, c1

key_aB,   a2, b2, c2

key_bA,   a3, b3, c3

as I mentioned above.

edited Jul 25 '12 at 15:13

answered Jul 25 '12 at 15:05

jmetz

79237

Try the join command:

NAME
join - join lines of two files on a common field

SYNOPSIS
join [OPTION]... FILE1 FILE2

DESCRIPTION
For each pair of input lines with identical join fields, write a line
to standard output. The default join field is the first, delimited by
whitespace. When FILE1 or FILE2 (not both) is -, read standard input.

So you should be able to do:

join file_A.csv file_B.csv > file_AB.csv

You may have to join your first and second fields into one for this to work though - as in essence they can be seen as one field anyway.

I just double checked and it seems to work as long as your files have the format e.g.:

file_A.csv

ID_aID_b, a,  b,  c

key_aA,   a1, b1, c1

key_aB,   a2, b2, c2

key_bA,   a3, b3, c3

as I mentioned above.

edited Jul 25 '12 at 15:13

answered Jul 25 '12 at 15:05

jmetz

79237

edited Jul 25 '12 at 15:13

answered Jul 25 '12 at 15:05

jmetz

79237

answered Jul 25 '12 at 15:05

jmetz

79237

answered Jul 25 '12 at 15:05

jmetz

79237

I don't think this'll work for me. I'd need to do scripting to merge then split the ID columns, so is would be just as easy to script the join.

– Andrew Wood
Jul 25 '12 at 15:44

@ajwood: That's unfortunate - in that case some amount of scripting will likely be needed.

– jmetz
Jul 25 '12 at 16:04

@ajwood - see my comment on the question itself - there is a very similar question already posted on stackoverflow.

– jmetz
Jul 25 '12 at 16:17

add a comment |

I don't think this'll work for me. I'd need to do scripting to merge then split the ID columns, so is would be just as easy to script the join.

– Andrew Wood
Jul 25 '12 at 15:44

@ajwood: That's unfortunate - in that case some amount of scripting will likely be needed.

– jmetz
Jul 25 '12 at 16:04

@ajwood - see my comment on the question itself - there is a very similar question already posted on stackoverflow.

– jmetz
Jul 25 '12 at 16:17

I don't think this'll work for me. I'd need to do scripting to merge then split the ID columns, so is would be just as easy to script the join.

– Andrew Wood
Jul 25 '12 at 15:44

@ajwood: That's unfortunate - in that case some amount of scripting will likely be needed.

– jmetz
Jul 25 '12 at 16:04

@ajwood - see my comment on the question itself - there is a very similar question already posted on stackoverflow.

– jmetz
Jul 25 '12 at 16:17

add a comment |

Here is my solution in Python

import sys

import csv



def main(args):

    # store each header we read

    headers = []



    # Intersect headers to get our keys

    for arg in args:

        with open(arg) as f:

            curr = csv.reader(f).next()

            headers.append(curr)

            try:

                keys = list( set(keys) & set(curr) )

            except NameError:

                keys = curr



    # New header

    header = list(keys)

    for h in headers:

        header += [ k for k in h if k not in keys ]



    # Join data

    data = {}

    for arg in args:

        with open(arg) as f:

            reader = csv.DictReader(f)

            for line in reader:

                data_key = tuple([ line[k] for k in keys ])

                if not data_key in data: data[data_key] = {}

                for k in header:

                    try:

                        data[data_key][k] = line[k]

                    except KeyError:

                        pass



    # Drop keys that are missing data (keys not present in all files)

    for key in data.keys():

        for col in header:

            if key in data and not col in data[key]:

                del( data[key] )



    # Dump data

    print ','.join(header)

    for key in sorted(data):

        row = [ data[key][col] for col in header ]

        print ','.join(row)





if __name__ == '__main__':

    sys.exit( main( sys.argv[1:]) )

answered Jul 25 '12 at 20:56

Andrew Wood

77921020

add a comment |

Here is my solution in Python

import sys

import csv



def main(args):

    # store each header we read

    headers = []



    # Intersect headers to get our keys

    for arg in args:

        with open(arg) as f:

            curr = csv.reader(f).next()

            headers.append(curr)

            try:

                keys = list( set(keys) & set(curr) )

            except NameError:

                keys = curr



    # New header

    header = list(keys)

    for h in headers:

        header += [ k for k in h if k not in keys ]



    # Join data

    data = {}

    for arg in args:

        with open(arg) as f:

            reader = csv.DictReader(f)

            for line in reader:

                data_key = tuple([ line[k] for k in keys ])

                if not data_key in data: data[data_key] = {}

                for k in header:

                    try:

                        data[data_key][k] = line[k]

                    except KeyError:

                        pass



    # Drop keys that are missing data (keys not present in all files)

    for key in data.keys():

        for col in header:

            if key in data and not col in data[key]:

                del( data[key] )



    # Dump data

    print ','.join(header)

    for key in sorted(data):

        row = [ data[key][col] for col in header ]

        print ','.join(row)





if __name__ == '__main__':

    sys.exit( main( sys.argv[1:]) )

answered Jul 25 '12 at 20:56

Andrew Wood

77921020

add a comment |

Here is my solution in Python

import sys

import csv



def main(args):

    # store each header we read

    headers = []



    # Intersect headers to get our keys

    for arg in args:

        with open(arg) as f:

            curr = csv.reader(f).next()

            headers.append(curr)

            try:

                keys = list( set(keys) & set(curr) )

            except NameError:

                keys = curr



    # New header

    header = list(keys)

    for h in headers:

        header += [ k for k in h if k not in keys ]



    # Join data

    data = {}

    for arg in args:

        with open(arg) as f:

            reader = csv.DictReader(f)

            for line in reader:

                data_key = tuple([ line[k] for k in keys ])

                if not data_key in data: data[data_key] = {}

                for k in header:

                    try:

                        data[data_key][k] = line[k]

                    except KeyError:

                        pass



    # Drop keys that are missing data (keys not present in all files)

    for key in data.keys():

        for col in header:

            if key in data and not col in data[key]:

                del( data[key] )



    # Dump data

    print ','.join(header)

    for key in sorted(data):

        row = [ data[key][col] for col in header ]

        print ','.join(row)





if __name__ == '__main__':

    sys.exit( main( sys.argv[1:]) )

answered Jul 25 '12 at 20:56

Andrew Wood

77921020

Here is my solution in Python

import sys

import csv



def main(args):

    # store each header we read

    headers = []



    # Intersect headers to get our keys

    for arg in args:

        with open(arg) as f:

            curr = csv.reader(f).next()

            headers.append(curr)

            try:

                keys = list( set(keys) & set(curr) )

            except NameError:

                keys = curr



    # New header

    header = list(keys)

    for h in headers:

        header += [ k for k in h if k not in keys ]



    # Join data

    data = {}

    for arg in args:

        with open(arg) as f:

            reader = csv.DictReader(f)

            for line in reader:

                data_key = tuple([ line[k] for k in keys ])

                if not data_key in data: data[data_key] = {}

                for k in header:

                    try:

                        data[data_key][k] = line[k]

                    except KeyError:

                        pass



    # Drop keys that are missing data (keys not present in all files)

    for key in data.keys():

        for col in header:

            if key in data and not col in data[key]:

                del( data[key] )



    # Dump data

    print ','.join(header)

    for key in sorted(data):

        row = [ data[key][col] for col in header ]

        print ','.join(row)





if __name__ == '__main__':

    sys.exit( main( sys.argv[1:]) )

answered Jul 25 '12 at 20:56

Andrew Wood

77921020

answered Jul 25 '12 at 20:56

Andrew Wood

77921020

answered Jul 25 '12 at 20:56

Andrew Wood

77921020

answered Jul 25 '12 at 20:56

Andrew Wood

77921020

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Super User!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Tjyylli