Merging two files in unix, with one common column that is redundant The Next CEO of Stack...

Can Sri Krishna be called 'a person'?

Calculate the Mean mean of two numbers

How to show a landlord what we have in savings?

Small nick on power cord from an electric alarm clock, and copper wiring exposed but intact

Planeswalker Ability and Death Timing

Arrows in tikz Markov chain diagram overlap

Is it "common practice in Fourier transform spectroscopy to multiply the measured interferogram by an apodizing function"? If so, why?

Is it OK to decorate a log book cover?

What does this strange code stamp on my passport mean?

Calculating discount not working

What steps are necessary to read a Modern SSD in Medieval Europe?

"Eavesdropping" vs "Listen in on"

Mathematica command that allows it to read my intentions

Is there a rule of thumb for determining the amount one should accept for a settlement offer?

Why did early computer designers eschew integers?

How dangerous is XSS

Is the offspring between a demon and a celestial possible? If so what is it called and is it in a book somewhere?

How to coordinate airplane tickets?

Is it okay to majorly distort historical facts while writing a fiction story?

What did the word "leisure" mean in late 18th Century usage?

pgfplots: How to draw a tangent graph below two others?

Is a linearly independent set whose span is dense a Schauder basis?

Finitely generated matrix groups whose eigenvalues are all algebraic

How to compactly explain secondary and tertiary characters without resorting to stereotypes?



Merging two files in unix, with one common column that is redundant



The Next CEO of Stack OverflowJoining text files with 600M+ linesGetting complement of two text files on the Unix command lineSearch for files with more than one term (grep, awk?)Separate one data file into two filesHow to find the matching records from 2 files in unixRemove last column values in field seperated file using unix shell scriptA way to split a text file into arbitrary blocks based on first column?How copy or display unique entries from two text files in linux or in windowsjoin files with variable column and line numbers and based on 1 fieldMerging Multiple files based on the common column












0















I have two files with one common column that is redundant.File 1 has chromosomal locations and TF's, file 2 has chromosomal locations and Refseq numbers.




File 1:


chr1:66997824-67000456      ZNF333

chr1:66997824-67000456      EGR1

chr1:66997824-67000456      MZF-1

chr22:51221989-51222166      Zic2
chr22:51221989-51222166      ZF5


File 2:

chr1:66997824-67000456      Refseq#1

chr22:51221989-51222166      Refseq#22



I would like to merge these two files, and create a new file with three columns,

chr1:66997824-67000456     ZNF333      Refseq#1

chr1:66997824-67000456      EGR1      Refseq#1

chr1:66997824-67000456      MZF-1      Refseq#1

chr22:51221989-51222166      Zic2      Refseq#22

chr22:51221989-51222166      ZF5      Refseq#22



Since the chromosomal locations are redundant, I could not merge them using join in Unix - Is there a way to merge using sed or awk?










share|improve this question























  • Please take a look at editing-help.

    – Cyrus
    Oct 31 '15 at 16:08











  • Try join command.

    – 2991ambusher
    Oct 31 '15 at 16:17
















0















I have two files with one common column that is redundant.File 1 has chromosomal locations and TF's, file 2 has chromosomal locations and Refseq numbers.




File 1:


chr1:66997824-67000456      ZNF333

chr1:66997824-67000456      EGR1

chr1:66997824-67000456      MZF-1

chr22:51221989-51222166      Zic2
chr22:51221989-51222166      ZF5


File 2:

chr1:66997824-67000456      Refseq#1

chr22:51221989-51222166      Refseq#22



I would like to merge these two files, and create a new file with three columns,

chr1:66997824-67000456     ZNF333      Refseq#1

chr1:66997824-67000456      EGR1      Refseq#1

chr1:66997824-67000456      MZF-1      Refseq#1

chr22:51221989-51222166      Zic2      Refseq#22

chr22:51221989-51222166      ZF5      Refseq#22



Since the chromosomal locations are redundant, I could not merge them using join in Unix - Is there a way to merge using sed or awk?










share|improve this question























  • Please take a look at editing-help.

    – Cyrus
    Oct 31 '15 at 16:08











  • Try join command.

    – 2991ambusher
    Oct 31 '15 at 16:17














0












0








0








I have two files with one common column that is redundant.File 1 has chromosomal locations and TF's, file 2 has chromosomal locations and Refseq numbers.




File 1:


chr1:66997824-67000456      ZNF333

chr1:66997824-67000456      EGR1

chr1:66997824-67000456      MZF-1

chr22:51221989-51222166      Zic2
chr22:51221989-51222166      ZF5


File 2:

chr1:66997824-67000456      Refseq#1

chr22:51221989-51222166      Refseq#22



I would like to merge these two files, and create a new file with three columns,

chr1:66997824-67000456     ZNF333      Refseq#1

chr1:66997824-67000456      EGR1      Refseq#1

chr1:66997824-67000456      MZF-1      Refseq#1

chr22:51221989-51222166      Zic2      Refseq#22

chr22:51221989-51222166      ZF5      Refseq#22



Since the chromosomal locations are redundant, I could not merge them using join in Unix - Is there a way to merge using sed or awk?










share|improve this question














I have two files with one common column that is redundant.File 1 has chromosomal locations and TF's, file 2 has chromosomal locations and Refseq numbers.




File 1:


chr1:66997824-67000456      ZNF333

chr1:66997824-67000456      EGR1

chr1:66997824-67000456      MZF-1

chr22:51221989-51222166      Zic2
chr22:51221989-51222166      ZF5


File 2:

chr1:66997824-67000456      Refseq#1

chr22:51221989-51222166      Refseq#22



I would like to merge these two files, and create a new file with three columns,

chr1:66997824-67000456     ZNF333      Refseq#1

chr1:66997824-67000456      EGR1      Refseq#1

chr1:66997824-67000456      MZF-1      Refseq#1

chr22:51221989-51222166      Zic2      Refseq#22

chr22:51221989-51222166      ZF5      Refseq#22



Since the chromosomal locations are redundant, I could not merge them using join in Unix - Is there a way to merge using sed or awk?







unix sed awk






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Oct 31 '15 at 14:41









ABBABB

11




11













  • Please take a look at editing-help.

    – Cyrus
    Oct 31 '15 at 16:08











  • Try join command.

    – 2991ambusher
    Oct 31 '15 at 16:17



















  • Please take a look at editing-help.

    – Cyrus
    Oct 31 '15 at 16:08











  • Try join command.

    – 2991ambusher
    Oct 31 '15 at 16:17

















Please take a look at editing-help.

– Cyrus
Oct 31 '15 at 16:08





Please take a look at editing-help.

– Cyrus
Oct 31 '15 at 16:08













Try join command.

– 2991ambusher
Oct 31 '15 at 16:17





Try join command.

– 2991ambusher
Oct 31 '15 at 16:17










1 Answer
1






active

oldest

votes


















1














join file1 file2


Output:




chr1:66997824-67000456 ZNF333 Refseq#1
chr1:66997824-67000456 EGR1 Refseq#1
chr1:66997824-67000456 MZF-1 Refseq#1
chr22:51221989-51222166 Zic2 Refseq#22
chr22:51221989-51222166 ZF5 Refseq#22


or



join file1 file2 | awk '{OFS="     ";print $1,$2,$3}'


Output:




chr1:66997824-67000456 ZNF333 Refseq#1
chr1:66997824-67000456 EGR1 Refseq#1
chr1:66997824-67000456 MZF-1 Refseq#1
chr22:51221989-51222166 Zic2 Refseq#22
chr22:51221989-51222166 ZF5 Refseq#22





share|improve this answer


























  • It didn't work, it exited after printing the first refseq#. My first file has 2000+records with chromosomal locations chr1:66997824-67000456 - the join command printed Refseq#1 for those 2000+ records and then exited. My input file has almost 20000+ lines whereas the output has only 2000+ lines.

    – ABB
    Nov 2 '15 at 15:27













  • @ABB: It should work though, only IF THE FILES ARE SORTED though. Use sort if they are not. Else, are you sure there is no mismatch between the first field of the two files? To test, try first with a subset, e.g using uniq -w 24 on file 1 and try to track your bug.

    – Joce
    Nov 3 '15 at 15:49













  • @ABB: Try this: join <(sort file1) <(sort file2)

    – Cyrus
    Nov 5 '15 at 6:38














Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "3"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f994318%2fmerging-two-files-in-unix-with-one-common-column-that-is-redundant%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














join file1 file2


Output:




chr1:66997824-67000456 ZNF333 Refseq#1
chr1:66997824-67000456 EGR1 Refseq#1
chr1:66997824-67000456 MZF-1 Refseq#1
chr22:51221989-51222166 Zic2 Refseq#22
chr22:51221989-51222166 ZF5 Refseq#22


or



join file1 file2 | awk '{OFS="     ";print $1,$2,$3}'


Output:




chr1:66997824-67000456 ZNF333 Refseq#1
chr1:66997824-67000456 EGR1 Refseq#1
chr1:66997824-67000456 MZF-1 Refseq#1
chr22:51221989-51222166 Zic2 Refseq#22
chr22:51221989-51222166 ZF5 Refseq#22





share|improve this answer


























  • It didn't work, it exited after printing the first refseq#. My first file has 2000+records with chromosomal locations chr1:66997824-67000456 - the join command printed Refseq#1 for those 2000+ records and then exited. My input file has almost 20000+ lines whereas the output has only 2000+ lines.

    – ABB
    Nov 2 '15 at 15:27













  • @ABB: It should work though, only IF THE FILES ARE SORTED though. Use sort if they are not. Else, are you sure there is no mismatch between the first field of the two files? To test, try first with a subset, e.g using uniq -w 24 on file 1 and try to track your bug.

    – Joce
    Nov 3 '15 at 15:49













  • @ABB: Try this: join <(sort file1) <(sort file2)

    – Cyrus
    Nov 5 '15 at 6:38


















1














join file1 file2


Output:




chr1:66997824-67000456 ZNF333 Refseq#1
chr1:66997824-67000456 EGR1 Refseq#1
chr1:66997824-67000456 MZF-1 Refseq#1
chr22:51221989-51222166 Zic2 Refseq#22
chr22:51221989-51222166 ZF5 Refseq#22


or



join file1 file2 | awk '{OFS="     ";print $1,$2,$3}'


Output:




chr1:66997824-67000456 ZNF333 Refseq#1
chr1:66997824-67000456 EGR1 Refseq#1
chr1:66997824-67000456 MZF-1 Refseq#1
chr22:51221989-51222166 Zic2 Refseq#22
chr22:51221989-51222166 ZF5 Refseq#22





share|improve this answer


























  • It didn't work, it exited after printing the first refseq#. My first file has 2000+records with chromosomal locations chr1:66997824-67000456 - the join command printed Refseq#1 for those 2000+ records and then exited. My input file has almost 20000+ lines whereas the output has only 2000+ lines.

    – ABB
    Nov 2 '15 at 15:27













  • @ABB: It should work though, only IF THE FILES ARE SORTED though. Use sort if they are not. Else, are you sure there is no mismatch between the first field of the two files? To test, try first with a subset, e.g using uniq -w 24 on file 1 and try to track your bug.

    – Joce
    Nov 3 '15 at 15:49













  • @ABB: Try this: join <(sort file1) <(sort file2)

    – Cyrus
    Nov 5 '15 at 6:38
















1












1








1







join file1 file2


Output:




chr1:66997824-67000456 ZNF333 Refseq#1
chr1:66997824-67000456 EGR1 Refseq#1
chr1:66997824-67000456 MZF-1 Refseq#1
chr22:51221989-51222166 Zic2 Refseq#22
chr22:51221989-51222166 ZF5 Refseq#22


or



join file1 file2 | awk '{OFS="     ";print $1,$2,$3}'


Output:




chr1:66997824-67000456 ZNF333 Refseq#1
chr1:66997824-67000456 EGR1 Refseq#1
chr1:66997824-67000456 MZF-1 Refseq#1
chr22:51221989-51222166 Zic2 Refseq#22
chr22:51221989-51222166 ZF5 Refseq#22





share|improve this answer















join file1 file2


Output:




chr1:66997824-67000456 ZNF333 Refseq#1
chr1:66997824-67000456 EGR1 Refseq#1
chr1:66997824-67000456 MZF-1 Refseq#1
chr22:51221989-51222166 Zic2 Refseq#22
chr22:51221989-51222166 ZF5 Refseq#22


or



join file1 file2 | awk '{OFS="     ";print $1,$2,$3}'


Output:




chr1:66997824-67000456 ZNF333 Refseq#1
chr1:66997824-67000456 EGR1 Refseq#1
chr1:66997824-67000456 MZF-1 Refseq#1
chr22:51221989-51222166 Zic2 Refseq#22
chr22:51221989-51222166 ZF5 Refseq#22






share|improve this answer














share|improve this answer



share|improve this answer








edited Oct 31 '15 at 16:56

























answered Oct 31 '15 at 16:11









CyrusCyrus

3,88611125




3,88611125













  • It didn't work, it exited after printing the first refseq#. My first file has 2000+records with chromosomal locations chr1:66997824-67000456 - the join command printed Refseq#1 for those 2000+ records and then exited. My input file has almost 20000+ lines whereas the output has only 2000+ lines.

    – ABB
    Nov 2 '15 at 15:27













  • @ABB: It should work though, only IF THE FILES ARE SORTED though. Use sort if they are not. Else, are you sure there is no mismatch between the first field of the two files? To test, try first with a subset, e.g using uniq -w 24 on file 1 and try to track your bug.

    – Joce
    Nov 3 '15 at 15:49













  • @ABB: Try this: join <(sort file1) <(sort file2)

    – Cyrus
    Nov 5 '15 at 6:38





















  • It didn't work, it exited after printing the first refseq#. My first file has 2000+records with chromosomal locations chr1:66997824-67000456 - the join command printed Refseq#1 for those 2000+ records and then exited. My input file has almost 20000+ lines whereas the output has only 2000+ lines.

    – ABB
    Nov 2 '15 at 15:27













  • @ABB: It should work though, only IF THE FILES ARE SORTED though. Use sort if they are not. Else, are you sure there is no mismatch between the first field of the two files? To test, try first with a subset, e.g using uniq -w 24 on file 1 and try to track your bug.

    – Joce
    Nov 3 '15 at 15:49













  • @ABB: Try this: join <(sort file1) <(sort file2)

    – Cyrus
    Nov 5 '15 at 6:38



















It didn't work, it exited after printing the first refseq#. My first file has 2000+records with chromosomal locations chr1:66997824-67000456 - the join command printed Refseq#1 for those 2000+ records and then exited. My input file has almost 20000+ lines whereas the output has only 2000+ lines.

– ABB
Nov 2 '15 at 15:27







It didn't work, it exited after printing the first refseq#. My first file has 2000+records with chromosomal locations chr1:66997824-67000456 - the join command printed Refseq#1 for those 2000+ records and then exited. My input file has almost 20000+ lines whereas the output has only 2000+ lines.

– ABB
Nov 2 '15 at 15:27















@ABB: It should work though, only IF THE FILES ARE SORTED though. Use sort if they are not. Else, are you sure there is no mismatch between the first field of the two files? To test, try first with a subset, e.g using uniq -w 24 on file 1 and try to track your bug.

– Joce
Nov 3 '15 at 15:49







@ABB: It should work though, only IF THE FILES ARE SORTED though. Use sort if they are not. Else, are you sure there is no mismatch between the first field of the two files? To test, try first with a subset, e.g using uniq -w 24 on file 1 and try to track your bug.

– Joce
Nov 3 '15 at 15:49















@ABB: Try this: join <(sort file1) <(sort file2)

– Cyrus
Nov 5 '15 at 6:38







@ABB: Try this: join <(sort file1) <(sort file2)

– Cyrus
Nov 5 '15 at 6:38




















draft saved

draft discarded




















































Thanks for contributing an answer to Super User!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f994318%2fmerging-two-files-in-unix-with-one-common-column-that-is-redundant%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

VNC viewer RFB protocol error: bad desktop size 0x0I Cannot Type the Key 'd' (lowercase) in VNC Viewer...

Tribunal Administrativo e Fiscal de Mirandela Referências Menu de...

looking for continuous Screen Capture for retroactivly reproducing errors, timeback machineRolling desktop...