Merging two files in unix, with one common column that is redundant The Next CEO of Stack...

Can Sri Krishna be called 'a person'?

Calculate the Mean mean of two numbers

How to show a landlord what we have in savings?

Small nick on power cord from an electric alarm clock, and copper wiring exposed but intact

Planeswalker Ability and Death Timing

Arrows in tikz Markov chain diagram overlap

Is it "common practice in Fourier transform spectroscopy to multiply the measured interferogram by an apodizing function"? If so, why?

Is it OK to decorate a log book cover?

What does this strange code stamp on my passport mean?

Calculating discount not working

What steps are necessary to read a Modern SSD in Medieval Europe?

"Eavesdropping" vs "Listen in on"

Mathematica command that allows it to read my intentions

Is there a rule of thumb for determining the amount one should accept for a settlement offer?

Why did early computer designers eschew integers?

How dangerous is XSS

Is the offspring between a demon and a celestial possible? If so what is it called and is it in a book somewhere?

How to coordinate airplane tickets?

Is it okay to majorly distort historical facts while writing a fiction story?

What did the word "leisure" mean in late 18th Century usage?

pgfplots: How to draw a tangent graph below two others?

Is a linearly independent set whose span is dense a Schauder basis?

Finitely generated matrix groups whose eigenvalues are all algebraic

How to compactly explain secondary and tertiary characters without resorting to stereotypes?

Merging two files in unix, with one common column that is redundant

The Next CEO of Stack OverflowJoining text files with 600M+ linesGetting complement of two text files on the Unix command lineSearch for files with more than one term (grep, awk?)Separate one data file into two filesHow to find the matching records from 2 files in unixRemove last column values in field seperated file using unix shell scriptA way to split a text file into arbitrary blocks based on first column?How copy or display unique entries from two text files in linux or in windowsjoin files with variable column and line numbers and based on 1 fieldMerging Multiple files based on the common column

I have two files with one common column that is redundant.File 1 has chromosomal locations and TF's, file 2 has chromosomal locations and Refseq numbers.

File 1:

chr1:66997824-67000456      ZNF333

chr1:66997824-67000456      EGR1

chr1:66997824-67000456      MZF-1

chr22:51221989-51222166      Zic2
chr22:51221989-51222166      ZF5

File 2:

chr1:66997824-67000456      Refseq#1

chr22:51221989-51222166      Refseq#22

I would like to merge these two files, and create a new file with three columns,

chr1:66997824-67000456     ZNF333      Refseq#1

chr1:66997824-67000456      EGR1      Refseq#1

chr1:66997824-67000456      MZF-1      Refseq#1

chr22:51221989-51222166      Zic2      Refseq#22

chr22:51221989-51222166      ZF5      Refseq#22

Since the chromosomal locations are redundant, I could not merge them using join in Unix - Is there a way to merge using sed or awk?

asked Oct 31 '15 at 14:41

ABB

Please take a look at editing-help.

– Cyrus
Oct 31 '15 at 16:08

Try join command.

– 2991ambusher
Oct 31 '15 at 16:17

add a comment |

I have two files with one common column that is redundant.File 1 has chromosomal locations and TF's, file 2 has chromosomal locations and Refseq numbers.

Since the chromosomal locations are redundant, I could not merge them using join in Unix - Is there a way to merge using sed or awk?

asked Oct 31 '15 at 14:41

ABB

Please take a look at editing-help.

– Cyrus
Oct 31 '15 at 16:08

Try join command.

– 2991ambusher
Oct 31 '15 at 16:17

add a comment |

I have two files with one common column that is redundant.File 1 has chromosomal locations and TF's, file 2 has chromosomal locations and Refseq numbers.

Since the chromosomal locations are redundant, I could not merge them using join in Unix - Is there a way to merge using sed or awk?

asked Oct 31 '15 at 14:41

ABB

I have two files with one common column that is redundant.File 1 has chromosomal locations and TF's, file 2 has chromosomal locations and Refseq numbers.

Since the chromosomal locations are redundant, I could not merge them using join in Unix - Is there a way to merge using sed or awk?

unix sed awk

asked Oct 31 '15 at 14:41

ABB

asked Oct 31 '15 at 14:41

ABB

asked Oct 31 '15 at 14:41

ABB

asked Oct 31 '15 at 14:41

ABB

asked Oct 31 '15 at 14:41

ABB

Please take a look at editing-help.

– Cyrus
Oct 31 '15 at 16:08

Try join command.

– 2991ambusher
Oct 31 '15 at 16:17

add a comment |

Please take a look at editing-help.

– Cyrus
Oct 31 '15 at 16:08

Try join command.

– 2991ambusher
Oct 31 '15 at 16:17

Please take a look at editing-help.

– Cyrus
Oct 31 '15 at 16:08

Try join command.

– 2991ambusher
Oct 31 '15 at 16:17

add a comment |

1 Answer
1

active

oldest

votes

join file1 file2

Output:



chr1:66997824-67000456 ZNF333 Refseq#1

chr1:66997824-67000456 EGR1 Refseq#1

chr1:66997824-67000456 MZF-1 Refseq#1

chr22:51221989-51222166 Zic2 Refseq#22

chr22:51221989-51222166 ZF5 Refseq#22

join file1 file2 | awk '{OFS="     ";print $1,$2,$3}'

Output:



chr1:66997824-67000456     ZNF333     Refseq#1

chr1:66997824-67000456     EGR1     Refseq#1

chr1:66997824-67000456     MZF-1     Refseq#1

chr22:51221989-51222166     Zic2     Refseq#22

chr22:51221989-51222166     ZF5     Refseq#22

edited Oct 31 '15 at 16:56

answered Oct 31 '15 at 16:11

Cyrus

3,88611125

It didn't work, it exited after printing the first refseq#. My first file has 2000+records with chromosomal locations chr1:66997824-67000456 - the join command printed Refseq#1 for those 2000+ records and then exited. My input file has almost 20000+ lines whereas the output has only 2000+ lines.

– ABB
Nov 2 '15 at 15:27

@ABB: It should work though, only IF THE FILES ARE SORTED though. Use sort if they are not. Else, are you sure there is no mismatch between the first field of the two files? To test, try first with a subset, e.g using uniq -w 24 on file 1 and try to track your bug.

– Joce
Nov 3 '15 at 15:49

@ABB: Try this: join <(sort file1) <(sort file2)

– Cyrus
Nov 5 '15 at 6:38

add a comment |

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "3"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f994318%2fmerging-two-files-in-unix-with-one-common-column-that-is-redundant%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

join file1 file2

Output:



chr1:66997824-67000456 ZNF333 Refseq#1

chr1:66997824-67000456 EGR1 Refseq#1

chr1:66997824-67000456 MZF-1 Refseq#1

chr22:51221989-51222166 Zic2 Refseq#22

chr22:51221989-51222166 ZF5 Refseq#22

join file1 file2 | awk '{OFS="     ";print $1,$2,$3}'

Output:



chr1:66997824-67000456     ZNF333     Refseq#1

chr1:66997824-67000456     EGR1     Refseq#1

chr1:66997824-67000456     MZF-1     Refseq#1

chr22:51221989-51222166     Zic2     Refseq#22

chr22:51221989-51222166     ZF5     Refseq#22

edited Oct 31 '15 at 16:56

answered Oct 31 '15 at 16:11

Cyrus

3,88611125

It didn't work, it exited after printing the first refseq#. My first file has 2000+records with chromosomal locations chr1:66997824-67000456 - the join command printed Refseq#1 for those 2000+ records and then exited. My input file has almost 20000+ lines whereas the output has only 2000+ lines.

– ABB
Nov 2 '15 at 15:27

@ABB: It should work though, only IF THE FILES ARE SORTED though. Use sort if they are not. Else, are you sure there is no mismatch between the first field of the two files? To test, try first with a subset, e.g using uniq -w 24 on file 1 and try to track your bug.

– Joce
Nov 3 '15 at 15:49

@ABB: Try this: join <(sort file1) <(sort file2)

– Cyrus
Nov 5 '15 at 6:38

add a comment |

join file1 file2

Output:



chr1:66997824-67000456 ZNF333 Refseq#1

chr1:66997824-67000456 EGR1 Refseq#1

chr1:66997824-67000456 MZF-1 Refseq#1

chr22:51221989-51222166 Zic2 Refseq#22

chr22:51221989-51222166 ZF5 Refseq#22

join file1 file2 | awk '{OFS="     ";print $1,$2,$3}'

Output:



chr1:66997824-67000456     ZNF333     Refseq#1

chr1:66997824-67000456     EGR1     Refseq#1

chr1:66997824-67000456     MZF-1     Refseq#1

chr22:51221989-51222166     Zic2     Refseq#22

chr22:51221989-51222166     ZF5     Refseq#22

edited Oct 31 '15 at 16:56

answered Oct 31 '15 at 16:11

Cyrus

3,88611125

It didn't work, it exited after printing the first refseq#. My first file has 2000+records with chromosomal locations chr1:66997824-67000456 - the join command printed Refseq#1 for those 2000+ records and then exited. My input file has almost 20000+ lines whereas the output has only 2000+ lines.

– ABB
Nov 2 '15 at 15:27

@ABB: It should work though, only IF THE FILES ARE SORTED though. Use sort if they are not. Else, are you sure there is no mismatch between the first field of the two files? To test, try first with a subset, e.g using uniq -w 24 on file 1 and try to track your bug.

– Joce
Nov 3 '15 at 15:49

@ABB: Try this: join <(sort file1) <(sort file2)

– Cyrus
Nov 5 '15 at 6:38

add a comment |

join file1 file2

Output:



chr1:66997824-67000456 ZNF333 Refseq#1

chr1:66997824-67000456 EGR1 Refseq#1

chr1:66997824-67000456 MZF-1 Refseq#1

chr22:51221989-51222166 Zic2 Refseq#22

chr22:51221989-51222166 ZF5 Refseq#22

join file1 file2 | awk '{OFS="     ";print $1,$2,$3}'

Output:



chr1:66997824-67000456     ZNF333     Refseq#1

chr1:66997824-67000456     EGR1     Refseq#1

chr1:66997824-67000456     MZF-1     Refseq#1

chr22:51221989-51222166     Zic2     Refseq#22

chr22:51221989-51222166     ZF5     Refseq#22

edited Oct 31 '15 at 16:56

answered Oct 31 '15 at 16:11

Cyrus

3,88611125

join file1 file2

Output:



chr1:66997824-67000456 ZNF333 Refseq#1

chr1:66997824-67000456 EGR1 Refseq#1

chr1:66997824-67000456 MZF-1 Refseq#1

chr22:51221989-51222166 Zic2 Refseq#22

chr22:51221989-51222166 ZF5 Refseq#22

join file1 file2 | awk '{OFS="     ";print $1,$2,$3}'

Output:



chr1:66997824-67000456     ZNF333     Refseq#1

chr1:66997824-67000456     EGR1     Refseq#1

chr1:66997824-67000456     MZF-1     Refseq#1

chr22:51221989-51222166     Zic2     Refseq#22

chr22:51221989-51222166     ZF5     Refseq#22

edited Oct 31 '15 at 16:56

answered Oct 31 '15 at 16:11

Cyrus

3,88611125

edited Oct 31 '15 at 16:56

answered Oct 31 '15 at 16:11

Cyrus

3,88611125

answered Oct 31 '15 at 16:11

Cyrus

3,88611125

answered Oct 31 '15 at 16:11

Cyrus

3,88611125

It didn't work, it exited after printing the first refseq#. My first file has 2000+records with chromosomal locations chr1:66997824-67000456 - the join command printed Refseq#1 for those 2000+ records and then exited. My input file has almost 20000+ lines whereas the output has only 2000+ lines.

– ABB
Nov 2 '15 at 15:27

@ABB: It should work though, only IF THE FILES ARE SORTED though. Use sort if they are not. Else, are you sure there is no mismatch between the first field of the two files? To test, try first with a subset, e.g using uniq -w 24 on file 1 and try to track your bug.

– Joce
Nov 3 '15 at 15:49

@ABB: Try this: join <(sort file1) <(sort file2)

– Cyrus
Nov 5 '15 at 6:38

add a comment |

It didn't work, it exited after printing the first refseq#. My first file has 2000+records with chromosomal locations chr1:66997824-67000456 - the join command printed Refseq#1 for those 2000+ records and then exited. My input file has almost 20000+ lines whereas the output has only 2000+ lines.

– ABB
Nov 2 '15 at 15:27

@ABB: It should work though, only IF THE FILES ARE SORTED though. Use sort if they are not. Else, are you sure there is no mismatch between the first field of the two files? To test, try first with a subset, e.g using uniq -w 24 on file 1 and try to track your bug.

– Joce
Nov 3 '15 at 15:49

@ABB: Try this: join <(sort file1) <(sort file2)

– Cyrus
Nov 5 '15 at 6:38

It didn't work, it exited after printing the first refseq#. My first file has 2000+records with chromosomal locations chr1:66997824-67000456 - the join command printed Refseq#1 for those 2000+ records and then exited. My input file has almost 20000+ lines whereas the output has only 2000+ lines.

– ABB
Nov 2 '15 at 15:27

@ABB: It should work though, only IF THE FILES ARE SORTED though. Use sort if they are not. Else, are you sure there is no mismatch between the first field of the two files? To test, try first with a subset, e.g using uniq -w 24 on file 1 and try to track your bug.

– Joce
Nov 3 '15 at 15:49

@ABB: Try this: join <(sort file1) <(sort file2)

– Cyrus
Nov 5 '15 at 6:38

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Super User!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

T5uv 2ESt0PWxaFcy,Y 3m47m

搜尋此網誌

Tjyylli